Apache Rya is a scalable RDF Store that is built on top of a Columnar Index Store (such as Accumulo). It is implemented as an extension to RDF4J to provide easy query mechanisms (SPARQL, SERQL, etc) and Rdf data storage (RDF/XML, NTriples, etc).
Rya stands for RDF y(and) Accumulo.
Manual
A copy of the Apache Rya Manual is located here. The material in the manual and below may be out of sync.
Upgrade Path
Since the data encodings changed in the 3.2.2 release, you will need to run the Upgrade322Tool MapReduce job to perform the upgrade.
Build the project with -Pmr to build the mapreduce artifacts
Make sure to clone the rya tables before doing the upgrade
Run
hadoop jar accumulo.rya-mr.jar org.apache.rya.accumulo.mr.upgrade.Upgrade322Tool -Dac.instance={} -Dac.username={} -Dac.pwd={}
Using Git, pull down the latest code from the url above.
Run the command to build the code mvn clean install
If all goes well, the build should be successful and a war should be produced in web/web.rya/target/web.rya.war
Note: The following profiles are available to tailor the build:
Profile ID
Purpose
geoindexing
perform a build of the geomesa/lucene indexing
mongodb
build with mongoDB configuration (defaults to accumulo)
To run the build with the profile ‘geoindexing’ mvn clean install -P geoindexing.
Note: If you are building on windows, you will need hadoop-common 2.6.0’s winutils.exe and hadoop.dll. You can download it from here. This build requires the Visual C++ Redistributable for Visual Studio 2015 (x64). Also you will need to set your path and Hadoop home using the commands below:
set HADOOP_HOME=c:\hadoop-common-2.6.0-bin
set PATH=%PATH%;c:\hadoop-common-2.6.0-bin\bin
Deployment Using Tomcat
Unwar the above war into the webapps directory.
To point the web.rya war to the appropriate database instance, make a properties file environment.properties and put it in the classpath.
Here is an example for accumulo:
# Accumulo instance name
instance.name=accumulo
# Accumulo Zookeepers
instance.zk=localhost:2181
# Accumulo username
instance.username=root
# Accumulo password
instance.password=secret
# Rya Table Prefix
rya.tableprefix=triplestore_
# To display the query plan
rya.displayqueryplan=true
Please consult the Accumulo, ZooKeeper, and Hadoop documentation for help with setting up these prerequisites.
Here is an example for mongoDB (populate user/userpassword if authentication to mongoDB required):
Here is a code snippet for directly running against Accumulo with the code. You will need at least accumulo.rya.jar, rya.api, rya.sail.impl on the classpath and transitive dependencies. I find that Maven is the easiest way to get a project dependency tree set up.
Connector connector = new ZooKeeperInstance("instance", "zoo1,zoo2,zoo3").getConnector("user", "password");
final RdfCloudTripleStore store = new RdfCloudTripleStore();
AccumuloRyaDAO crdfdao = new AccumuloRyaDAO();
crdfdao.setConnector(connector);
AccumuloRdfConfiguration conf = new AccumuloRdfConfiguration();
conf.setTablePrefix("rya_");
conf.setDisplayQueryPlan(true);
crdfdao.setConf(conf);
store.setRyaDAO(crdfdao);
InferenceEngine inferenceEngine = new InferenceEngine();
inferenceEngine.setRyaDAO(crdfdao);
inferenceEngine.setConf(conf);
store.setInferenceEngine(inferenceEngine);
Repository myRepository = new RyaSailRepository(store);
myRepository.initialize();
String query = "select * where {\n" +
"<http://mynamespace/ProductType1> ?p ?o.\n" +
"}";
RepositoryConnection conn = myRepository.getConnection();
System.out.println(query);
TupleQuery tupleQuery = conn.prepareTupleQuery(
QueryLanguage.SPARQL, query);
ValueFactory vf = SimpleValueFactory.getInstance();
TupleQueryResultHandler writer = new SPARQLResultsXMLWriter(System.out);
tupleQuery.evaluate(writer);
conn.close();
myRepository.shutDown();
Web REST endpoint
The War sets up a Web REST endpoint at http://server/web.rya/loadrdf that allows POST data to get loaded into the Rdf Store. This short tutorial will use Java code to post data.
First, you will need data to load and will need to figure out what format that data is in.
For this sample, we will use the following N-Triples:
Second, use the following Java code to load data to the REST endpoint:
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.URL;
import java.net.URLConnection;
public class LoadDataServletRun {
public static void main(String[] args) {
try {
final InputStream resourceAsStream = Thread.currentThread().getContextClassLoader()
.getResourceAsStream("$RDF_DATA");
URL url = new URL("http://server/web.rya/loadrdf" +
"?format=N-Triples" +
"");
URLConnection urlConnection = url.openConnection();
urlConnection.setRequestProperty("Content-Type", "text/plain");
urlConnection.setDoOutput(true);
final OutputStream os = urlConnection.getOutputStream();
int read;
while((read = resourceAsStream.read()) >= 0) {
os.write(read);
}
resourceAsStream.close();
os.flush();
BufferedReader rd = new BufferedReader(new InputStreamReader(
urlConnection.getInputStream()));
String line;
while ((line = rd.readLine()) != null) {
System.out.println(line);
}
rd.close();
os.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Compile and run this code above, changing the references for $RDF_DATA and the url that your Rdf War is running at.
The default “format” is RDF/XML, but these formats are supported : RDFXML, NTRIPLES, TURTLE, N3, TRIX, TRIG.
Bulk Loading data
Bulk loading data is done through Map Reduce jobs
Bulk Load RDF data
This Map Reduce job will read a full file into memory and parse it into statements. The statements are saved into the store. Here is an example for storing in Accumulo:
rdf.tablePrefix : The tables (spo, po, osp) are prefixed with this qualifier. The tables become: (rdf.tablePrefix)spo,(rdf.tablePrefix)po,(rdf.tablePrefix)osp
ac.* : Accumulo connection parameters
rdf.format : See RDFFormat from RDF4J, samples include (Trig, N-Triples, RDF/XML)
io.sort.mb : Higher the value, the faster the job goes. Just remember that you will need this much ram at least per mapper
The argument is the directory/file to load. This file needs to be loaded into HDFS before running.
Direct RDF4J API
Here is some sample code to load data directly through the RDF4J API. (Loading N-Triples data)
You will need at least accumulo.rya-, rya.api, rya.sail.impl on the classpath and transitive dependencies. I find that Maven is the easiest way to get a project dependency tree set up.
final RdfCloudTripleStore store = new RdfCloudTripleStore();
AccumuloRdfConfiguration conf = new AccumuloRdfConfiguration();
AccumuloRyaDAO dao = new AccumuloRyaDAO();
Connector connector = new ZooKeeperInstance("instance", "zoo1,zoo2,zoo3").getConnector("user", "password");
dao.setConnector(connector);
conf.setTablePrefix("rya_");
dao.setConf(conf);
store.setRyaDAO(dao);
Repository myRepository = new RyaSailRepository(store);
myRepository.initialize();
RepositoryConnection conn = myRepository.getConnection();
//load data from file
final File file = new File("ntriples.ntrips");
conn.add(new FileInputStream(file), file.getName(),
RDFFormat.NTRIPLES, new Resource[]{});
conn.commit();
conn.close();
myRepository.shutDown();
Query Data
Web JSP endpoint
Open a url to http://server/web.rya/sparqlQuery.jsp. This simple form can run Sparql.
Web REST endpoint
The War sets up a Web REST endpoint at http://server/web.rya/queryrdf that allows GET requests with queries.
For this sample, we will assume you already loaded data from the [loaddata.html] tutorial
Save this file somewhere $RDF_DATA
Second, use the following Java code to load data to the REST endpoint:
Apache Rya
Overview
Apache Rya is a scalable RDF Store that is built on top of a Columnar Index Store (such as Accumulo). It is implemented as an extension to RDF4J to provide easy query mechanisms (SPARQL, SERQL, etc) and Rdf data storage (RDF/XML, NTriples, etc).
Rya stands for RDF y(and) Accumulo.
Manual
A copy of the Apache Rya Manual is located here. The material in the manual and below may be out of sync.
Upgrade Path
Since the data encodings changed in the 3.2.2 release, you will need to run the Upgrade322Tool MapReduce job to perform the upgrade.
Quick Start VM
A quickstart Vagrant VM is availible here
Quick Start
This tutorial will outline the steps needed to get quickly started with the Rya store using the web based endpoint.
Prerequisites
Building from Source
Using Git, pull down the latest code from the url above.
Run the command to build the code
mvn clean installIf all goes well, the build should be successful and a war should be produced in
web/web.rya/target/web.rya.warNote: The following profiles are available to tailor the build:
To run the build with the profile ‘geoindexing’
mvn clean install -P geoindexing.Note: If you are building on windows, you will need hadoop-common 2.6.0’s
winutils.exeandhadoop.dll. You can download it from here. This build requires the Visual C++ Redistributable for Visual Studio 2015 (x64). Also you will need to set your path and Hadoop home using the commands below:Deployment Using Tomcat
Unwar the above war into the webapps directory.
To point the web.rya war to the appropriate database instance, make a properties file
environment.propertiesand put it in the classpath.Here is an example for accumulo:
Please consult the Accumulo, ZooKeeper, and Hadoop documentation for help with setting up these prerequisites.
Here is an example for mongoDB (populate user/userpassword if authentication to mongoDB required):
Start the Tomcat server.
./bin/startup.shUsage
Load Data
Direct Code
Here is a code snippet for directly running against Accumulo with the code. You will need at least accumulo.rya.jar, rya.api, rya.sail.impl on the classpath and transitive dependencies. I find that Maven is the easiest way to get a project dependency tree set up.
Web REST endpoint
The War sets up a Web REST endpoint at
http://server/web.rya/loadrdfthat allows POST data to get loaded into the Rdf Store. This short tutorial will use Java code to post data.First, you will need data to load and will need to figure out what format that data is in.
For this sample, we will use the following N-Triples:
Save this file somewhere $RDF_DATA
Second, use the following Java code to load data to the REST endpoint:
Compile and run this code above, changing the references for $RDF_DATA and the url that your Rdf War is running at.
The default “format” is RDF/XML, but these formats are supported : RDFXML, NTRIPLES, TURTLE, N3, TRIX, TRIG.
Bulk Loading data
Bulk loading data is done through Map Reduce jobs
Bulk Load RDF data
This Map Reduce job will read a full file into memory and parse it into statements. The statements are saved into the store. Here is an example for storing in Accumulo:
Options:
The argument is the directory/file to load. This file needs to be loaded into HDFS before running.
Direct RDF4J API
Here is some sample code to load data directly through the RDF4J API. (Loading N-Triples data) You will need at least accumulo.rya-, rya.api, rya.sail.impl on the classpath and transitive dependencies. I find that Maven is the easiest way to get a project dependency tree set up.
Query Data
Web JSP endpoint
Open a url to
http://server/web.rya/sparqlQuery.jsp. This simple form can run Sparql.Web REST endpoint
The War sets up a Web REST endpoint at
http://server/web.rya/queryrdfthat allows GET requests with queries.For this sample, we will assume you already loaded data from the [loaddata.html] tutorial
Save this file somewhere $RDF_DATA
Second, use the following Java code to load data to the REST endpoint:
Compile and run this code above, changing the url that your Rdf War is running at.