Updated: June 11th, 2015

The Problem

I am throwing up a quick post about a relatively cryptic error that Solr started throwing the other day here at Plaxo. After happily running for a few days, I suddenly started getting pages about failed Solr indexing.

Upon closer examination, I saw the following repeatedly in the log file:

catalina.2009-09-18.log:SEVERE: java.io.IOException: directory 'DATADIR/index'
exists and is a directory, but cannot be listed: list() returned null

I tried to see if sending an OPTIMIZE command would help but the server returned the same response.

Digging Deeper

The reason was these errors was quite simple – Solr was running into the system level limit on allowed number of open files (ulimit). This limit can be seen by running

Share

  • the database
  • CPU
  • memory
  • IO
    • disk latency
    • network latency
  • slow queries
  • media size deployment example
    • 300 platforms (300 remote agents collecting data)
    • 2,100 servers
    • 21,000 services (10 services per server), sounds feasible
    • 468,000 metrics (20 metrics per service)
    • 28,800,000 metric data rows per day
    • larger deployments have a lot more of these (sounds crazy)
  • data
    • measurement_id
    • timestamp
    • value
    • primary key (timestamp, measurement_id)
  • data flow
  • Share