Merging EOL and Freebase

August 25th, 2010 — 5:41pm

I wrote a simple Acre application that allows users to easily add EOL keys to Freebase organism classification.

Encyclopedia of Life is an ambitious project to create “.. electronic page for each species of organism on Earth..” and Freebase is a open structured data repository recently purchased by Google.

Go to app: http://eolfetch.freebaseapps.com/ (you need to have Freebase account)

View Comments | Software

ActiveMQ flow control and Apache Camel transacted route gotchas

July 9th, 2010 — 9:02pm

We have a system that uses Apache Camel and ActiveMQ. It handles periodic bursts of 20,000 messages. The end consumer is slow and it takes about an hour to process all the messages. On their route, messages are passed from one queue to another. On our production system we stumbled onto an unexpected problem. After finishing few thousand messages the whole system would freeze.

The problem was with ActiveMQ flow control and transacted routes in Camel.

When you have slow consumers, to prevent queues from growing infinitely, ActiveMQ has a limit of how many messages you can put in a queue. When limit is reached, producer is, by default, forced to wait until the resources free. You can set these limits on system level or per queue level. Problem arises when you have transacted Camel route and system limit is reached.

AMQ Flow Control

Route that moves messages from queue A to queue B is inside a JMS transaction – you can’t remove message from queue A until the message is successfully placed on queue B.

If system limit is reached, no new messages can be sent to any queue. So, producers are forced to wait, transaction doesn’t complete, messages can’t be taken off queue A and no resource gets freed. TheΒ  whole processing freezes.

There are numerous ways you can work around this problem. You can turn off flow control and potentially let queues grow indefinitely.

In our case, solution was to set per-queue limits so that system limits can never be reached. Sum of limits for all queues needs to be less than the system limit. That way, as consumer takes messages from queue B, new messages can come in, transactions can complete and messages can be taken off queue A. Messages are consumed from queues A and B at the same pace and the whole system works fine.

In our case, I’ve set memoryLimit to 10m for our 13 queues and system memoryUsage to 180m.

For details see :

View Comments | Software

Solr DisMax parser and stop words

May 25th, 2010 — 11:49pm

If you want to use DisMax parser in Solr you need to be careful how to index the fields that DisMax will be using.

If you mix fields that filter out stop words (plain text) and fields that do not filter out stop words (like author names), your simple queries might end up with no results.

By default, DisMax will display only results that contain all the words from your query string. If your query has stop words like “ants of madagascar”, stop word “of” might not be found in any of the fields – it’s not in author names and it’s filtered out in article body – and you will get zero results.

Possible workarounds:

  • Relax Minimum Match (mm) requirement.
    Downside: Lowering mm will increase number of results. mm of 50% on “ants madagascar” will return all documents that have “ants” and all docs that have “madagascar” in them.
  • Do not filter out stop words.
    Downside: Your index can get large and you might get large number of less relevant results.
  • Use other indexing schemes like N-Grams.

This article explains the details.

Also see this and this discussion.

View Comments | Software

XSLT Unicode Horror

May 18th, 2010 — 10:50pm

Different Java XSLT implementation have different handling of UTF-8 characters. Here is test code that parses UTF-8 XML into DOM document and then serializes it using a transformer.

  1. System.out.println("    SOURCE:  " + source);
  2. DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(parserClass, TestUnicode.class.getClassLoader());
  3. Document document = builderFactory.newDocumentBuilder().parse(new InputSource(new StringReader(source)));
  4. TransformerFactory transformerFactory = TransformerFactory.newInstance(transformerClass, TestUnicode.class.getClassLoader());
  5. StringWriter writer = new StringWriter();
  6. transformerFactory.newTransformer().transform(new DOMSource(document), new StreamResult(writer));
  7. System.out.println("    RESULT:  " + writer.toString());

I tested following transformers:

  • Xalan 2.7.1:

    • org.apache.xalan.processor.TransformerFactoryImpl
    • org.apache.xalan.xsltc.trax.TransformerFactoryImpl
    • org.apache.xalan.xsltc.trax.SmartTransformerFactoryImpl
  • Sun-Xalan (an internal transformer factory present in Sun JDK 5 and 6):

    • com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl
  • Saxon 8.7:

    • net.sf.saxon.TransformerFactoryImpl

Here are the results for Mathematical Script Capital D character: π’Ÿ

  1. TRANSFORMER: org.apache.xalan.processor.TransformerFactoryImpl
  2.     SOURCE:  <?xml version="1.0" encoding="UTF-8"?><foo>π’Ÿ</foo>
  3.     RESULT:  <?xml version="1.0" encoding="UTF-8"?><foo>&#55349;&#56479;</foo>
  4.     SOURCE:  <?xml version="1.0" encoding="UTF-8"?><foo>&#119967;</foo>
  5.     RESULT:  <?xml version="1.0" encoding="UTF-8"?><foo>&#55349;&#56479;</foo>
  6. TRANSFORMER: org.apache.xalan.xsltc.trax.TransformerFactoryImpl
  7.     SOURCE:  <?xml version="1.0" encoding="UTF-8"?><foo>π’Ÿ</foo>
  8.     RESULT:  <?xml version="1.0" encoding="UTF-8"?><foo>&#55349;&#56479;</foo>
  9.     SOURCE:  <?xml version="1.0" encoding="UTF-8"?><foo>&#119967;</foo>
  10.     RESULT:  <?xml version="1.0" encoding="UTF-8"?><foo>&#55349;&#56479;</foo>
  11. TRANSFORMER: org.apache.xalan.xsltc.trax.SmartTransformerFactoryImpl
  12.     SOURCE:  <?xml version="1.0" encoding="UTF-8"?><foo>π’Ÿ</foo>
  13.     RESULT:  <?xml version="1.0" encoding="UTF-8"?><foo>&#55349;&#56479;</foo>
  14.     SOURCE:  <?xml version="1.0" encoding="UTF-8"?><foo>&#119967;</foo>
  15.     RESULT:  <?xml version="1.0" encoding="UTF-8"?><foo>&#55349;&#56479;</foo>
  16. TRANSFORMER: com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl
  17.     SOURCE:  <?xml version="1.0" encoding="UTF-8"?><foo>π’Ÿ</foo>
  18.     RESULT:  <?xml version="1.0" encoding="UTF-8" standalone="no"?><foo>&#119967;</foo>
  19.     SOURCE:  <?xml version="1.0" encoding="UTF-8"?><foo>&#119967;</foo>
  20.     RESULT:  <?xml version="1.0" encoding="UTF-8" standalone="no"?><foo>&#119967;</foo>
  21. TRANSFORMER: net.sf.saxon.TransformerFactoryImpl
  22.     SOURCE:  <?xml version="1.0" encoding="UTF-8"?><foo>π’Ÿ</foo>
  23.     RESULT:  <?xml version="1.0" encoding="UTF-8"?><foo>π’Ÿ</foo>
  24.     SOURCE:  <?xml version="1.0" encoding="UTF-8"?><foo>&#119967;</foo>
  25.     RESULT:  <?xml version="1.0" encoding="UTF-8"?><foo>π’Ÿ</foo>

Or, summarized in a table:

  π’Ÿ &#119967;
Xalan 2.7.1 &#55349;&#56479; &#55349;&#56479;
Sun-Xalan (Sun JDK 1.5+) &#119967; &#119967;
Saxon 8.7 π’Ÿ π’Ÿ

The results were the same regardless of the parser implementation. Xerces or Saxon.

Xalan’s handling of UTF-8 multi-byte characters seems to be seriously flawed. &#55349;&#56479; are not valid UTF-8 characters and both Xerces and Saxon parsers will throw SAXParseException when trying to parse documents that have them.

View Comments | Software

Nasty bug with generics and introspection

April 19th, 2010 — 5:15pm
  1. public class Test {
  2.  
  3.   public static interface Foo<T> {
  4.     public T getX();
  5.   }
  6.  
  7.   public static class Bar implements Foo<String> {
  8.     public String getX() {
  9.       return "Hello World";
  10.     }
  11.   }
  12.  
  13.   public static void main(String[] args) {
  14.     Method[] methods = Bar.class.getDeclaredMethods();
  15.  
  16.     for (Method method : methods) {
  17.       System.out.println(method.getReturnType().toString()+" "+method.getName());
  18.     }
  19.   }
  20. }

Will return :

  1. class java.lang.String getX
  2. class java.lang.Object getX

A) It’s illegal in Java to have two methods with the same signature returning different types.

B) The order in which these methods are returned would be completely random. For example, this can cause BeanUtils.copyProperties(..) to intermittently fail to copy some bean properties. BeanUtils would take the first get method returned, find that the return type is not matching corresponding set method and skip it.

The bug is present in both Java 5 and 6. There are several bugs filed around this problem. For example: 6422403 and 6528714. The bad news is that this is not going to be fixed until Java 7.

View Comments | Software

RDFa on PLoS

October 24th, 2009 — 8:10am

After the last release, all PLoS articles are tagged with RDFa.

For example, go to this article. To view triples from the page use Firefox Operator plugin or GetN3. You should see following Dublin Core triples.

  1. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/language> "en" .
  2. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/publisher> "Public Library of Science" .
  3. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/type> <http://purl.org/dc/dcmitype/Text> .
  4. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/title> "Anesthetics Rapidly Promote Synaptogenesis during a Critical Period of Brain Development" .
  5. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/description> "
  6. Experience-driven activity plays an essential role in the development of brain circuitry during critical periods of early postnatal life, a process that depends upon a dynamic balance between excitatory and inhibitory signals. Since general anesthetics are powerful pharmacological modulators of neuronal activity, an important question is whether and how these drugs can affect the development of synaptic networks. To address this issue, we examined here the impact of anesthetics on synapse growth and dynamics. We show that exposure of young rodents to anesthetics that either enhance GABAergic inhibition or block NMDA receptors rapidly induce a significant increase in dendritic spine density in the somatosensory cortex and hippocampus. This effect is developmentally regulated; it is transient but lasts for several days and is also reproduced by selective antagonists of excitatory receptors. Analyses of spine dynamics in hippocampal slice cultures reveals that this effect is mediated through an increased rate of protrusions formation, a better stabilization of newly formed spines, and leads to the formation of functional synapses. Altogether, these findings point to anesthesia as an important modulator of spine dynamics in the developing brain and suggest the existence of a homeostatic process regulating spine formation as a function of neural activity. Importantly, they also raise concern about the potential impact of these drugs on human practice, when applied during critical periods of development in infants.
  7. " .
  8. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/identifier> <http://dx.doi.org/10.1371/journal.pone.0007043> .
  9. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/date> "2009-09-16"^^<http://www.w3.org/2001/XMLSchema-datatypes#date> .
  10. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/subject> "Anesthetic Mechanisms" .
  11. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/subject> "Anesthesiology and Pain Management" .
  12. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/subject> "Developmental Biology" .
  13. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/subject> "Neurodevelopment" .
  14. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/subject> "Neuroscience" .
  15. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/creator> "Mathias De Roo" .
  16. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/creator> "Paul Klauser" .
  17. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/creator> "Adrian Briner" .
  18. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/creator> "Irina Nikonenko" .
  19. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/creator> "Pablo Mendez" .
  20. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/creator> "Alexandre Dayer" .
  21. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/creator> "Jozsef Z. Kiss" .
  22. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/creator> "Dominique Muller" .
  23. <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007043> <http://purl.org/dc/terms/creator> "Laszlo Vutskits" .

Currently supported DC terms are identifier, title, description, type, language, creator, subject, publisher

View Comments | Software

Hoy Kid

October 17th, 2009 — 6:43pm

Laurie wrote another article for Mission Local, San Francisco Mission neighborhood blog/news site. Hoy Kids is turning into a regular column. And my baby boy is on the front page !

Read the article>

View Comments | Mission, Web

Birth – A Play by Karen Brody

September 5th, 2008 — 10:40pm

Laurie performed last weekend in play Birth by Karen Brody and directed by AimΓ©e Miles. It was a performance for the Birth Fest 2008 in San Francisco. Birth Fest is an event created to support midwifery and natural birth.

Laurie got an additional applause after showing her belly !

View Comments | Art, Society

New Job

August 30th, 2008 — 8:03pm
Xpiron Inc. Public Library of Science

Next week I will be starting as a software developer at Public Library of Science.

I spent more than 7 years at Xpiron. This is the longest time I spent at any company. I went through Xpiron’s ups and downs. Xpiron is now a stable company with solid number of customers and a promising future. It was a great experience.

I am really excited about PLoS. They have an unique architecture. Mulgara RDF database at back end, Fedora BLOB store, Topaz object-triplet mapping and Ambra publishing platform.

This year is a year of change of me. A new job, a new flat and I am becoming a father in October !

View Comments | Job, Web

It’s a Boy !

June 3rd, 2008 — 6:21pm

At 19 weeks

Click for larger version>

View Comments | Personal

Back to top

Real Time Web Analytics