Skip to content

uprasad/lucene

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lucene Experiments

Indexing

Docker container setup

  1. Clone the repository and cd into the cloned directory
  2. Build the Docker image $ docker build -t <image-name>
  3. Start a container and connect to it $ docker run -it test-img /bin/bash

Indexing tool

The indexing tool doesn't require arguments

/home$ java -cp lucene.jar:lucene-core-9.11.0.jar indexing.IndexData

and available arguments can be viewed with the -help flag

/home$ java -cp lucene.jar:lucene-core-9.11.0.jar indexing.IndexData -help
java indexing.IndexData [-help]
	[-index INDEX_PATH]
	[-num_docs NUM_DOCS]
	[-update]
	[-docs_per_segment DOCS_PER_SEGMENT]
	[-info_stream INFO_STREAM_FILE]
	[-dict DICT_FILE]
	[-disable_compound_file]

The invocation can be wrapped in an strace -tt -ff to dump filesystem trace logs e.g.

/home$ mkdir strace_out
/home$ strace -tt -ff \
  -e openat,close,read,write,mmap,lseek,unlink \
  -o strace_out/strace.log \
  java -cp lucene-core-9.11.0.jar:lucene.jar indexing.IndexData -num_docs 5000 -docs_per_segment 1000

Merge the per-pid strace logs with

/home$ strace-log-merge strace_out/strace.log > strace_out/strace.log

Note: The issue with using -f is that strace sometimes splits the same syscall across multiple lines, breaking the parsing code.

Filesystem activity during indexing can then be visualized by running

/home$ python3 strace_events_viz.py strace_out/strace.log

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published