Setting up HBase for use with Socorro is a bit of a bear! The default Vagrant config sets up a VM with filesystem-only. For those that want to try out the HBase support, or are on a path toward setting up a production instance, these instructions might help you along the way.
You may also be interested in Lars’ recent blog posts about Socorro:
Here’s how I got it all working on an Ubuntu Precise (12.04) system, along with some scripts for launching important processes and putting test crashes into the system so you can tell that it is working. Ultimately, my goal is to incorporate all of this into some setup scripts to help new users out.
Set up HBase and Thrift
Socorro uses the Thrift API to insert new crashes and retrieve them through the middleware layer. These Quickstart instructions are pretty helpful for getting HBase installed.
Then, you need to edit
/etc/hosts
and remove the ‘127.0.1.1’ entry, and add your hostname to the localhost ‘127.0.0.1’ line. Also, it’s helpful for the defaults to add ‘crash-stats
‘ and ‘crash-reports
‘ as host aliases. Your final config line for localhost would look like:
127.0.0.1 localhost wuzetian crash-reports crash-stats
(where wuzetian
is your hostname)
You also need to add configuration for HBase. Here’s an example:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///var/tmp/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/var/tmp/zookeeper</value>
</property>
</configuration>
That sets the location for your HBase files for and zookeeper. This setup is for testing, so I put the directories in a location can easily clear out.
Then, to start HBase and Thrift up:
/etc/init.d/hadoop-hbase-master start
/etc/init.d/hadoop-hbase-thrift start
Setting up processor tools
The processor that looks at raw crashes runs two tools by default: minidump_stackwalk
and exploitable
.
You can build these from the socorro source tree with:
make minidump_stackwalk
Then make install
should put these files into a useful location.
You can also just copy the binaries from the stackwalk/bin directory and the other is exploitable/exploitable.
The paths for these are configured in config/processor.ini
: exploitability_tool_pathname
and minidump_stackwalk_pathname
There’s also a symbols resolver configured, but I am not setting this up in my test.
Disable LZO compression for HBase (unless you have it configured
Our hbase schema is configured to use LZO compression by default. Change that to ‘NONE’ and load the schema into hbase:
/bin/cat /home/socorro/dev/socorro/analysis/hbase_schema | sed 's/LZO/NONE/g' | /usr/bin/hbase shell
Set up crashmover
Update two lines in scripts/config/collectorconfig.py:
localFS.default = '/home/socorro/primaryCrashStore'
fallbackFS.default = '/home/socorro/fallback'
Set those to directories that you can store crash dumps.
Configure processor and monitor to use HBase
You need to set the processor up to use HBase instead of local crash storage.
The easiest way to do this is as follows:
PYTHONPATH=. python socorro/processor/processor_app.py --admin.conf=./config/processor.ini --source.crashstorage_class=socorro.external.hbase.crashstorage.HBaseCrashStorage --admin.dump_conf=config/processor2.ini
PYTHONPATH=. python socorro/processor/monitor_app.py --admin.conf=./config/monitor.ini --source.crashstorage_class=socorro.external.hbase.crashstorage.HBaseCrashStorage --admin.dump_conf=config/monitor2.ini
Then edit both files to reflect your HBase configuration.
Starting up
The docs suggest starting up four daemons in screen sessions. I mocked up a shell script and a screenrc to get you started.
And that’s it! You should now have a working system, with crashes being submitted and stashed into HBase, and the monitor and processor picking up crashes as they arrive and running the stackwalk and exploitable tools against the crashes.
Please let me know if these instructions work, or don’t work, for you.