Google Mr4c GNU Lesser 3 Google Mr4c MR4C is an implementation framework that allows you to run native code within the Hadoop execution framework. License: GNU Lesser 3, .

Related tags

Big data mr4c
Overview

Introduction to the MR4C repo

About MR4C

MR4C is an implementation framework that allows you to run native code within the Hadoop execution framework. Pairing the performance and flexibility of natively developed algorithms with the unfettered scalability and throughput inherent in Hadoop, MR4C enables large-scale deployment of advanced data processing applications.

Map to this repo

This repository includes user guide, tutorials and source code for the MR4C framework created by Google Inc. We suggest you run through this repo in the following order:

  1. Make sure that you have all dependencies and build (see below).
  2. Test that MR4C install was successful - Run test_mr4c.sh from the test directory
  3. Study up on MR4C - README.md in the UserGuide directory covers the basic concepts behind MR4C
  4. Run through the example algorithms in the tutorial directory
  5. Build your own algorithm using the examples as templates and let us know if you have questions or comments!

Dependencies

  • tested with Ubuntu 12.04 and CentOS 6.5
  • tested with CDH 5.2.0 (either MRV1 or YARN)
  • ant (1.8.2 min)
  • java (1.6 min)
  • ivy (2.1 min)
  • make (3.8.1 min)
  • g++ (4.6.3 min)
  • log4cxx (0.10.0)
  • jansson (2.2.1 min)
  • cppunit (1.12.1 min)
  • proj4 (4.8.0 min)
  • gdal (1.10 min)

Build

There are four scripts included to build, clean, deploy and/or remove mr4c. Build with:

./build_all

Clean previous builds with:

./clean_all

Deploy to /usr/local/mr4c using:

./deploy_all

Remove all components with:

./remove_all

If you get stuck, have questions, or would like to provide any feedback, please don’t hesitate to contact us at [email protected]. Let’s do big things together.

Comments
  • I am unable to ./build_all.

    I am unable to ./build_all.

    I get stuck with the following:

    test/cpp/tests/MR4CTests.cpp:17:33: fatal error: cppunit/TestFixture.h: No such file or directory #include <cppunit/TestFixture.h>

    Can you provide instructions on how to get the dependencies for the different platforms?

    opened by shyamalschandra 5
  • my map phase looks like looping forever

    my map phase looks like looping forever

    After installation, I'd like to practice which I learned from tutorial. Below is my test code to read and write file in hdfs.

      void executeAlgorithm(AlgorithmData& data, AlgorithmContext& context)
      {
        Dataset* input = data.getInputDataset("input");
        Dataset* outputHist = data.getOutputDataset("output");
    
        std::set<DataKey> keys = input->getAllFileKeys();
        for ( std::set<DataKey>::iterator i = keys.begin(); i != keys.end(); i++ ) {
          DataKey myKey = *i;
          std::cout << myKey << std::endl;
    
          RandomAccessFile* randIn = input->getDataFileForRandomAccess(myKey);
          DataFile * fileOut = new DataFile("text/plain");
          WritableRandomAccessFile * randOut = outputHist->addDataFileForRandomAccess(myKey, fileOut);
          size_t fileSize = randIn->getFileSize();
          randOut->setFileSize(fileSize);
          size_t read = 0;
          size_t bufferSize = 1024 * 1024;
          char* data = new char[bufferSize];
          read = randIn->read(data, bufferSize);
          delete[] data;
          randIn->close();
          randOut->close();
        }
      }
    

    Map phase looks like looping forever thought input is not so big. The percentage goes from 0% to some% again and again(Now it's executing for 6+hours).

    ...
    21345055 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 17% reduce 0%
    21346070 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 0% reduce 0%
    21359136 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 33% reduce 0%
    21360141 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 44% reduce 0%
    21465685 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 0% reduce 0%
    21479771 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 44% reduce 0%
    21585825 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 0% reduce 0%
    21598939 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 44% reduce 0%
    ...
    

    Something wrong in my test code? Let me know please. Thanks, Jun

    opened by hyunjun 4
  • another build  issue

    another build issue

    the last step ./tools/build_yarn. on the end ,all test failed ,no other error. do-test: [junit] TEST com.google.mr4c.nativec.ExternalAlgorithmDataSerializerTest FAILED [junit] TEST com.google.mr4c.nativec.ExternalAlgorithmSerializerTest FAILED [junit] TEST com.google.mr4c.nativec.ExternalDatasetSerializerTest FAILED [junit] TEST com.google.mr4c.nativec.jna.JnaExternalEntryTest FAILED [junit] Tests FAILED

    BUILD FAILED then in dist directory ,no any jar package ,how to solve?

    opened by xuyan1972 2
  • Anything wrong in the way to use getBytes() in my code?

    Anything wrong in the way to use getBytes() in my code?

      void executeAlgorithm(AlgorithmData& data, AlgorithmContext& context)
      {
        Dataset* input = data.getInputDataset("input");
    
        std::set<DataKey> keys = input->getAllFileKeys();
        for ( std::set<DataKey>::iterator i = keys.begin(); i != keys.end(); i++ ) {
          DataKey myKey = *i;
          DataFile* myFile = input->getDataFile(myKey);
          long fileSize = myFile->getSize();
          char * fileBytes=myFile->getBytes();
        }
      }
    

    Adding the last line char * fileBytes = myFile->getBytes();, failed while without the line, succeeded. Anything wrong in my code?

    opened by hyunjun 2
  • build jnaerate failed

    build jnaerate failed

    I tried to build jnaerate object in ant, but failed, see below error message. I have already copied jna-3.4.0.jar and jnaerator-0.11.jar to java/lib. Is there any other depended jars jnaerate need? Thanks

    jnaerate:

    BUILD FAILED java.lang.NoClassDefFoundError: com.ochafik.lang.jnaerator.parser.Identifier$QualifiedIdentifier at java.lang.Class.forNameImpl(Native Method) at java.lang.Class.forName(Class.java:237) at org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:135) at org.apache.tools.ant.taskdefs.Java.run(Java.java:764) at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:218) at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:132) at org.apache.tools.ant.taskdefs.Java.execute(Java.java:105) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:357) at org.apache.tools.ant.Target.performTasks(Target.java:385) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1337) at org.apache.tools.ant.Project.executeTarget(Project.java:1306) at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41) at org.apache.tools.ant.Project.executeTargets(Project.java:1189) at org.apache.tools.ant.Main.runBuild(Main.java:758) at org.apache.tools.ant.Main.startAnt(Main.java:217) at org.apache.tools.ant.launch.Launcher.run(Launcher.java:257) at org.apache.tools.ant.launch.Launcher.main(Launcher.java:104) Caused by: java.lang.ClassNotFoundException: com.ochafik.lang.jnaerator.parser.Identifier$QualifiedIdentifier at org.apache.tools.ant.AntClassLoader.findClassInComponents(AntClassLoader.java:1400) at org.apache.tools.ant.AntClassLoader.findClass(AntClassLoader.java:1341) at org.apache.tools.ant.AntClassLoader.loadClass(AntClassLoader.java:1094) at java.lang.ClassLoader.loadClass(ClassLoader.java:827) ... 23 more

    opened by yangzp138 2
  • ABOUT gdaal ISSUE

    ABOUT gdaal ISSUE

    Now I want save file into hdfs ,then read them by gdal interface(RastIO) ,then write result to other HDFS file by gdal interface (etc :create ,,rastIO) .Can anyone tell me the code outline of this process? Dataset* input = data.getInputDataset("imageIn"); std::set keys = input->getAllFileKeys(); Dataset* outputHist = data.getOutputDataset("out"); for ( std::set::iterator i = keys.begin(); i != keys.end(); i++ ) {
    DataKey myKey = i; DataFile skyFile = input->getDataFile(myKey); std::string inputFileName = myKey.toName("__")+"_input"; GDALMemoryFile inputMemFile(inputFileName, *skyFile); GDALDataset * poDataset = inputMemFile.getGDALDataset();

        //then how to create new hdfs file  by gdal and add to outputHist 
        .....
    
    opened by xuyan1972 1
  • build_all issue

    build_all issue

    g++ -std=c++0x -I./src/cpp/api -I./test/cpp/api -fPIC -Wall -o objs/suite/Suite.o -c test/cpp/suites/Suite.cpp g++ -std=c++0x -Wall -rdynamic -o exe/run_tests ./objs/impl/context/Logger.o ./objs/impl/context/Message.o ./objs/impl/context/AlgorithmContext.o ./objs/impl/algorithm/AlgorithmRunner.o ./objs/impl/algorithm/AlgorithmAutoRegister.o ./objs/impl/algorithm/Algorithm.o ./objs/impl/algorithm/AlgorithmRegistry.o ./objs/impl/algorithm/AlgorithmConfig.o ./objs/impl/algorithm/AlgorithmData.o ./objs/impl/error/ErrorReporter.o ./objs/impl/error/JsonErrorSerializer.o ./objs/impl/error/Error.o ./objs/impl/util/IOUtil.o ./objs/impl/util/MR4CEnvironment.o ./objs/impl/util/StackUtil.o ./objs/impl/util/MR4CLogging.o ./objs/impl/util/Properties.o ./objs/impl/util/MR4CTempFiles.o ./objs/impl/keys/KeyspaceDimension.o ./objs/impl/keys/DataKey.o ./objs/impl/keys/DataKeyBuilder.o ./objs/impl/keys/KeyspaceBuilder.o ./objs/impl/keys/Keyspace.o ./objs/impl/keys/DataKeyDimension.o ./objs/impl/keys/DataKeyElement.o ./objs/impl/metadata/MetadataKey.o ./objs/impl/metadata/MetadataField.o ./objs/impl/metadata/MetadataMap.o ./objs/impl/metadata/MetadataList.o ./objs/impl/metadata/MetadataElement.o ./objs/impl/metadata/MetadataArray.o ./objs/impl/metadata/Primitive.o ./objs/impl/catalog/DimensionCatalog.o ./objs/impl/catalog/ImageTypes.o ./objs/impl/serialize/json/JsonAlgorithmConfigSerializer.o ./objs/impl/serialize/json/JsonCommonSerializer.o ./objs/impl/serialize/json/JsonKeyspaceSerializer.o ./objs/impl/serialize/json/JsonDatasetSerializer.o ./objs/impl/serialize/json/JsonPropertiesSerializer.o ./objs/impl/serialize/json/JanssonUtil.o ./objs/impl/serialize/json/JsonSerializerFactory.o ./objs/impl/serialize/json/JsonAlgorithmSerializer.o ./objs/impl/serialize/SerializerRegistry.o ./objs/impl/dataset/SimpleDataFileSource.o ./objs/impl/dataset/DataFile.o ./objs/impl/dataset/DatasetContext.o ./objs/impl/dataset/LocalTempFile.o ./objs/impl/dataset/LocalDataFileSink.o ./objs/impl/dataset/LocalDataFileSource.o ./objs/impl/dataset/Dataset.o ./objs/impl/dataset/DataFileSource.o ./objs/impl/external/CExternalDataFile.o ./objs/impl/external/CExternalContext.o ./objs/impl/external/ExternalDatasetContext.o ./objs/impl/external/ExternalDataFile.o ./objs/impl/external/CExternalDataset.o ./objs/impl/external/CExternalAlgorithmData.o ./objs/impl/external/ExternalRandomAccessFileSource.o ./objs/impl/external/CExternalEntry.o ./objs/impl/external/ExternalEntry.o ./objs/impl/external/ExternalAlgorithmData.o ./objs/impl/external/CExternalDataFileSource.o ./objs/impl/external/CExternalAlgorithm.o ./objs/impl/external/ExternalAlgorithmSerializer.o ./objs/impl/external/ExternalContext.o ./objs/impl/external/ExternalDataFileSink.o ./objs/impl/external/ExternalDataset.o ./objs/impl/external/ExternalAlgorithmDataSerializer.o ./objs/impl/external/CExternalRandomAccessFileSource.o ./objs/impl/external/CExternalDataFileSink.o ./objs/impl/external/ExternalRandomAccessFile.o ./objs/impl/external/ExternalRandomAccessFileSink.o ./objs/impl/external/ExternalDatasetSerializer.o ./objs/impl/external/ExternalAlgorithm.o ./objs/impl/external/ExternalDataFileSource.o ./objs/impl/external/CExternalRandomAccessFileSink.o ./objs/test/context/ContextTests.o ./objs/test/context/TestMessage.o ./objs/test/algorithm/TestAlgorithmConfig.o ./objs/test/algorithm/TestAlgorithmData.o ./objs/test/algorithm/AlgorithmTests.o ./objs/test/algorithm/AlgorithmDataTestUtil.o ./objs/test/error/ErrorTests.o ./objs/test/error/TestJsonErrorSerializer.o ./objs/test/error/TestError.o ./objs/test/util/TestArrayUtil.o ./objs/test/util/UtilTests.o ./objs/test/util/TestMR4CTempFiles.o ./objs/test/util/TestIOUtil.o ./objs/test/util/TestProperties.o ./objs/test/util/TestMR4CLogging.o ./objs/test/util/TestMR4CEnvironment.o ./objs/test/keys/TestKeyspaceDimension.o ./objs/test/keys/TestDataKey.o ./objs/test/keys/KeyspaceTestUtil.o ./objs/test/keys/TestKeyspace.o ./objs/test/keys/TestKeyspaceBuilder.o ./objs/test/keys/TestDataKeyDimension.o ./objs/test/keys/KeysTests.o ./objs/test/keys/TestDataKeyBuilder.o ./objs/test/keys/TestDataKeyElement.o ./objs/test/metadata/TestPrimitive.o ./objs/test/metadata/TestMetadataList.o ./objs/test/metadata/TestMetadataArray.o ./objs/test/metadata/TestMetadataKey.o ./objs/test/metadata/TestMetadataMap.o ./objs/test/metadata/TestMetadataField.o ./objs/test/metadata/MetadataTests.o ./objs/test/multithread/MultithreadTests.o ./objs/test/multithread/TestExternalRandomAccessFileSinkMultithread.o ./objs/test/multithread/SimultaneousThreadRunner.o ./objs/test/multithread/TestExternalRandomAccessFileMultithread.o ./objs/test/multithread/TestDatasetMultithread.o ./objs/test/multithread/TestDataFileMultithread.o ./objs/test/MR4CTests.o ./objs/test/serialize/json/TestJsonAlgorithmConfigSerializer.o ./objs/test/serialize/json/TestJsonKeyspaceSerializer.o ./objs/test/serialize/json/TestJsonAlgorithmSerializer.o ./objs/test/serialize/json/TestJsonPropertiesSerializer.o ./objs/test/serialize/json/JsonTests.o ./objs/test/serialize/json/TestJsonDatasetSerializer.o ./objs/test/dataset/TestLocalDataFileSink.o ./objs/test/dataset/TestDataFile.o ./objs/test/dataset/TestLocalDataFileSource.o ./objs/test/dataset/TestDataset.o ./objs/test/dataset/DatasetTests.o ./objs/test/dataset/DatasetTestUtil.o ./objs/test/dataset/TestSimpleDataFileSource.o ./objs/test/dataset/TestLocalTempFile.o ./objs/test/external/TestExternalEntry.o ./objs/test/external/ExternalTests.o ./objs/test/external/TestExternalAlgorithmDataSerializer.o ./objs/test/external/TestExternalDatasetSerializer.o ./objs/suite/Suite.o -llog4cxx -ljansson -lcppunit /usr/local/glibc/lib/libpthread.so.0: undefined reference to `memcpy@GLIBC_2.14' collect2: error: ld returned 1 exit status make: *** [exe/run_tests] Error 1 how to solve? i installed glibc 2.14 and give its lib path in LD_LIBRARY_PATH variable.

    opened by xuyan1972 1
  • A question about the keyspace.

    A question about the keyspace.

       I have run the examples. And I have a question about the kayspace. I read the description of keyspace. 
       "The keyspace is an index of unique elements in the dataset. 
        Each key refers to a particular peice of the data without having to keep track of a lot paths. This can be especially handy when we are operating on a large cluster where all of the files are not necessarily local."
       I think it means that every file has a key instead of every record in the file. Am I right?   
    
    opened by ghost 1
  • Build_all error

    Build_all error

    All dependencies are installed. But it still doesn't work properly.

    error.txt

    I'm having a lot of errors, too, don't know if that is normal.

    collect2: error: ld returned 1 exit status makefile:152: recipe for target 'exe/run_tests' failed make: *** [exe/run_tests] Error 1

    Those are the last output lines on terminal. I'm running on ubuntu.

    opened by edutra 1
  • when i build_yarn, do-test errors

    when i build_yarn, do-test errors

    there are no logs for this,what can i do?who can help me?thanks.

    do-compile:

    compile:

    do-test: [junit] TEST com.google.mr4c.AlgoRunnerTest FAILED [junit] TEST com.google.mr4c.algorithm.AlgorithmDataTest FAILED [junit] TEST com.google.mr4c.algorithm.AlgorithmSchemaTest FAILED [junit] TEST com.google.mr4c.config.ConfigDescriptorTest FAILED [junit] TEST com.google.mr4c.config.ConfigUtilsTest FAILED [junit] TEST com.google.mr4c.config.algorithm.AlgorithmConfigTest FAILED [junit] TEST com.google.mr4c.config.algorithm.DimensionConfigTest FAILED [junit] TEST com.google.mr4c.config.category.CategoryBuilderTest FAILED [junit] TEST com.google.mr4c.config.category.CategoryConfigTest FAILED [junit] TEST com.google.mr4c.config.category.CategoryParserTest FAILED [junit] TEST com.google.mr4c.config.category.MR4CConfigBuilderTest FAILED [junit] TEST com.google.mr4c.config.category.MR4CConfigTest FAILED [junit] TEST com.google.mr4c.config.diff.DiffConfigTest FAILED [junit] TEST com.google.mr4c.config.execution.DatasetConfigTest FAILED [junit] TEST com.google.mr4c.config.execution.DirectoryConfigTest FAILED [junit] TEST com.google.mr4c.config.execution.ExecutionConfigTest FAILED [junit] TEST com.google.mr4c.config.execution.LocationsConfigTest FAILED [junit] TEST com.google.mr4c.config.execution.MapConfigTest FAILED [junit] TEST com.google.mr4c.config.execution.PatternMapperConfigTest FAILED [junit] TEST com.google.mr4c.config.resources.LimitSourceTest FAILED [junit] TEST com.google.mr4c.config.resources.ResourceConfigTest FAILED [junit] TEST com.google.mr4c.config.resources.ResourceLimitTest FAILED [junit] TEST com.google.mr4c.config.resources.ResourceRequestTest FAILED [junit] TEST com.google.mr4c.config.site.ClusterConfigTest FAILED [junit] TEST com.google.mr4c.config.site.SiteConfigTest FAILED [junit] TEST com.google.mr4c.config.test.AlgoTestConfigTest FAILED [junit] TEST com.google.mr4c.content.RelativeContentFactoryTest FAILED [junit] TEST com.google.mr4c.content.S3CredentialsTest FAILED [junit] TEST com.google.mr4c.dataset.DataFileTest FAILED [junit] TEST com.google.mr4c.dataset.DatasetDiffTest FAILED [junit] TEST com.google.mr4c.dataset.DatasetTest FAILED [junit] TEST com.google.mr4c.dataset.DatasetTransformerTest FAILED [junit] TEST com.google.mr4c.dataset.LogsDatasetBuilderTest FAILED [junit] TEST com.google.mr4c.hadoop.ClusterTest FAILED [junit] TEST com.google.mr4c.hadoop.DataKeyListTest FAILED [junit] TEST com.google.mr4c.hadoop.DataLocalizerTest FAILED [junit] TEST com.google.mr4c.hadoop.MR4CArgumentParserTest FAILED [junit] TEST com.google.mr4c.hadoop.MR4CGenericOptionsParserTest FAILED [junit] TEST com.google.mr4c.hadoop.MR4CGenericOptionsTest FAILED [junit] TEST com.google.mr4c.hadoop.MR4CInputFormatTest FAILED [junit] TEST com.google.mr4c.hadoop.MR4CInputSplitTest FAILED [junit] TEST com.google.mr4c.hadoop.MR4CMRJobTest FAILED [junit] TEST com.google.mr4c.hadoop.MR4CMapperTest FAILED [junit] TEST com.google.mr4c.hadoop.MR4CRecordWriterTest FAILED [junit] TEST com.google.mr4c.hadoop.MR4CReducerTest FAILED [junit] TEST com.google.mr4c.keys.BasicDataKeyFilterTest FAILED [junit] TEST com.google.mr4c.keys.BasicElementFilterTest FAILED [junit] TEST com.google.mr4c.keys.CompoundDataKeyTest FAILED [junit] TEST com.google.mr4c.keys.DataKeyComparatorTest FAILED [junit] TEST com.google.mr4c.keys.DataKeyDimensionTest FAILED [junit] TEST com.google.mr4c.keys.DataKeyElementTest FAILED [junit] TEST com.google.mr4c.keys.DataKeyUtilsTest FAILED [junit] TEST com.google.mr4c.keys.DimensionBasedKeyFilterTest FAILED [junit] TEST com.google.mr4c.keys.ElementTransformerTest FAILED [junit] TEST com.google.mr4c.keys.KeyDimensionPartitionerTest FAILED [junit] TEST com.google.mr4c.keys.KeyTransformerTest FAILED [junit] TEST com.google.mr4c.keys.KeyspaceDimensionTest FAILED [junit] TEST com.google.mr4c.keys.KeyspacePartitionerTest FAILED [junit] TEST com.google.mr4c.keys.KeyspaceTest FAILED [junit] TEST com.google.mr4c.keys.SimpleDataKeyTest FAILED [junit] TEST com.google.mr4c.mbtiles.MBTilesFileTest FAILED [junit] TEST com.google.mr4c.mbtiles.TileFormatTest FAILED [junit] TEST com.google.mr4c.mbtiles.TileKeyTest FAILED [junit] TEST com.google.mr4c.mbtiles.TileTest FAILED [junit] TEST com.google.mr4c.message.MessageTest FAILED [junit] TEST com.google.mr4c.metadata.MetadataArrayTest FAILED [junit] TEST com.google.mr4c.metadata.MetadataFieldTest FAILED [junit] TEST com.google.mr4c.metadata.MetadataKeyExtractorTest FAILED [junit] TEST com.google.mr4c.metadata.MetadataKeyTest FAILED [junit] TEST com.google.mr4c.metadata.MetadataListTest FAILED [junit] TEST com.google.mr4c.metadata.MetadataMapTest FAILED [junit] TEST com.google.mr4c.nativec.ExternalAlgorithmDataSerializerTest FAILED [junit] TEST com.google.mr4c.nativec.ExternalAlgorithmSerializerTest FAILED [junit] TEST com.google.mr4c.nativec.ExternalDatasetSerializerTest FAILED [junit] TEST com.google.mr4c.nativec.jna.JnaExternalEntryTest FAILED [junit] TEST com.google.mr4c.serialize.bean.BeanBasedAlgorithmSerializerTest FAILED [junit] TEST com.google.mr4c.serialize.bean.BeanBasedDatasetSerializerTest FAILED [junit] TEST com.google.mr4c.serialize.bean.BeanBasedKeyspaceSerializerTest FAILED [junit] TEST com.google.mr4c.serialize.bean.algorithm.AlgorithmSchemaBeanTest FAILED [junit] TEST com.google.mr4c.serialize.bean.dataset.DataFileBeanTest FAILED [junit] TEST com.google.mr4c.serialize.bean.dataset.DatasetBeanTest FAILED [junit] TEST com.google.mr4c.serialize.bean.keys.DataKeyBeanTest FAILED [junit] TEST com.google.mr4c.serialize.bean.keys.DataKeyElementBeanTest FAILED [junit] TEST com.google.mr4c.serialize.bean.metadata.MetadataArrayBeanTest FAILED [junit] TEST com.google.mr4c.serialize.bean.metadata.MetadataFieldBeanTest FAILED [junit] TEST com.google.mr4c.serialize.bean.metadata.MetadataKeyBeanTest FAILED [junit] TEST com.google.mr4c.serialize.bean.metadata.MetadataListBeanTest FAILED [junit] TEST com.google.mr4c.serialize.bean.metadata.MetadataMapBeanTest FAILED [junit] TEST com.google.mr4c.serialize.json.JsonAlgorithmBeanSerializerTest FAILED [junit] TEST com.google.mr4c.serialize.json.JsonConfigSerializerTest FAILED [junit] TEST com.google.mr4c.serialize.json.JsonDatasetBeanSerializerTest FAILED [junit] TEST com.google.mr4c.serialize.json.JsonKeyspaceBeanSerializerTest FAILED [junit] TEST com.google.mr4c.serialize.json.JsonPropertiesSerializerTest FAILED [junit] TEST com.google.mr4c.sources.ArchiveDatasetSourceTest FAILED [junit] TEST com.google.mr4c.sources.BinaryDatasetSourceTest FAILED [junit] TEST com.google.mr4c.sources.CompositeKeyFileMapperTest FAILED [junit] TEST com.google.mr4c.sources.DiskFileSourceTest FAILED [junit] TEST com.google.mr4c.sources.FilesDatasetSourceTest FAILED [junit] TEST com.google.mr4c.sources.HDFSFileSourceTest FAILED [junit] TEST com.google.mr4c.sources.HeterogenousFileSourceTest FAILED [junit] TEST com.google.mr4c.sources.InMemoryArchiveSourceTest FAILED [junit] TEST com.google.mr4c.sources.InMemoryFileSourceTest FAILED [junit] TEST com.google.mr4c.sources.LogsDatasetSourceTest FAILED [junit] TEST com.google.mr4c.sources.MBTilesDatasetSourceTest FAILED [junit] TEST com.google.mr4c.sources.MapFileSourceDFSTest FAILED [junit] TEST com.google.mr4c.sources.MapFileSourceLocalTest FAILED [junit] TEST com.google.mr4c.sources.MetafilesDatasetSourceTest FAILED [junit] TEST com.google.mr4c.sources.PatternKeyFileMapperTest FAILED [junit] TEST com.google.mr4c.sources.RandomAccessFileSinkTest FAILED [junit] TEST com.google.mr4c.sources.RandomAccessFileSourceTest FAILED [junit] TEST com.google.mr4c.sources.SimpleDatasetSourceTest FAILED [junit] TEST com.google.mr4c.sources.SourceUtilsTest FAILED [junit] TEST com.google.mr4c.sources.StagedDatasetSourceTest FAILED [junit] TEST com.google.mr4c.sources.TransformedDatasetSourceTest FAILED [junit] TEST com.google.mr4c.util.ByteBufferInputStreamTest FAILED [junit] TEST com.google.mr4c.util.CollectionUtilsTest FAILED [junit] TEST com.google.mr4c.util.CombinatoricUtilsTest FAILED [junit] TEST com.google.mr4c.util.CustomFormatTest FAILED [junit] TEST com.google.mr4c.util.MR4CLoggingTest FAILED [junit] TEST com.google.mr4c.util.NamespacedPropertiesTest FAILED [junit] TEST com.google.mr4c.util.PartitionerTest FAILED [junit] TEST com.google.mr4c.util.PathUtilsTest FAILED [junit] TEST com.google.mr4c.util.SetAnalysisTest FAILED [junit] Tests FAILED

    BUILD FAILED /home/zhangsh/code/mr4c/java/build.xml:231: if=true

    Total time: 11 seconds

    opened by briantzhang 2
  • MessageConsumer API missing callback to recieveMessage (for http)

    MessageConsumer API missing callback to recieveMessage (for http)

    The current implementation of mr4c provides a default HttpMessageHandler which provides plumbing to send messages to a registered topic URL.

    This works, and sends topic messages to the remote URL.

    HttpMessageHandler

    There is currently no mechanism to receive any http messages via a same / similar route.

    The API is there MessageConsumer for which you implement virtual void receiveMessage(const Message& msg)

    However mr4c is missing a Http Message Handler for receiving messages, and sending them through to any registered MessageConsumers

    opened by rbuckland 0
  • MR4C does not use HADOOP_CONF_DIR

    MR4C does not use HADOOP_CONF_DIR

    MR4C uses the hard-coded directory /etc/hadoop/conf as the location for the Hadoop configuration files. It would been more convenient to be able to control this location via the standard Hadoop environment variable HADOOP_CONF_DIR.

    opened by kshaf 0
Owner
Google
Google ❤️ Open Source
Google
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

Elephant Bird About Elephant Bird is Twitter's open source library of LZO, Thrift, and/or Protocol Buffer-related Hadoop InputFormats, OutputFormats,

Twitter 1.1k Jan 5, 2023
Hadoop library for large-scale data processing, now an Apache Incubator project

Apache DataFu Follow @apachedatafu Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by

LinkedIn's Attic 589 Apr 1, 2022
:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop

Elasticsearch Hadoop Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Apache Hive, Apache Pig, Apach

elastic 1.9k Dec 22, 2022
Real-time Query for Hadoop; mirror of Apache Impala

Welcome to Impala Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters. Impala is a modern, massively-distri

Cloudera 27 Dec 28, 2022
In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.

All the files have been commented for your ease. Furthermore you may also add further comments if you may. For further queries contact me at : chhxnsh

Hassan Shahzad 5 Aug 14, 2021
Program finds average number of words in each comment given a large data set by use of hadoop's map reduce to work in parallel efficiently.

Finding average number of words in all the comments in a data set ?? Mapper Function In the mapper function we first tokenize entire data and then fin

Aleezeh Usman 3 Aug 23, 2021
Program that uses Hadoop Map-Reduce to identify the anagrams of the words of a file

Hadoop-MapReduce-Anagram-Solver The implementation consists of a program that utilizes the Hadoop Map-Reduce framework to identify the anagrams of the

Nikolas Petrou 2 Dec 4, 2022
This code base is retained for historical interest only, please visit Apache Incubator Repo for latest one

Apache Kylin Apache Kylin is an open source Distributed Analytics Engine to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supp

Kylin OLAP Engine 561 Dec 4, 2022
MapReduce Code for Counting the numbers in JAVA

Anurag000-rgb/MapReduce-Repetation_Counting MapReduce Code for Counting the numbers in JAVA Basically in this project But it good to write in Apache Spark using scala Rather In Apache MapReduce

Anurag Panda 2 Mar 1, 2022
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

Apache Gobblin Apache Gobblin is a highly scalable data management solution for structured and byte-oriented data in heterogeneous data ecosystems. Ca

The Apache Software Foundation 2.1k Jan 4, 2023
Tribal Trouble GNU 2 Tribal Trouble - Tribal Trouble is a realtime strategy game released by Oddlabs in 2004. In 2014 the source was released under GPL2 license. License: GNU 2, .

Tribal Trouble Tribal Trouble is a realtime strategy game released by Oddlabs in 2004. In 2014 the source was released under GPL2 license, and can be

Sune Hagen Nielsen 147 Dec 8, 2022
MixStack lets you connects Flutter smoothly with Native pages, supports things like Multiple Tab Embeded Flutter View, Dynamic tab changing, and more. You can enjoy a smooth transition from legacy native code to Flutter with it.

中文 README MixStack MixStack lets you connects Flutter smoothly with Native pages, supports things like Multiple Tab Embeded Flutter View, Dynamic tab

Yuewen Engineering 80 Dec 19, 2022
:package: Gradle/Maven plugin to package Java applications as native Windows, Mac OS X, or GNU/Linux executables and create installers for them.

JavaPackager JavaPackager is a hybrid plugin for Maven and Gradle which provides an easy way to package Java applications in native Windows, Mac OS X

Francisco Vargas Ruiz 665 Jan 8, 2023
RR4J is a tool that records java execution and later allows developers to replay locally.

RR4J [Record Replay 4 Java] RR4J is a tool that records java execution and later allows developers to replay locally. The tool solves one of the chall

Kartik  kalaghatgi 18 Dec 7, 2022
SparkFE is the LLVM-based and high-performance Spark native execution engine which is designed for feature engineering.

Spark has rapidly emerged as the de facto standard for big data processing. However, it is not designed for machine learning which has more and more limitation in AI scenarios. SparkFE rewrite the execution engine in C++ and achieve more than 6x performance improvement for feature extraction. It guarantees the online-offline consistency which makes AI landing much easier. For further details, please refer to SparkFE Documentation.

4Paradigm 67 Jun 10, 2021
A sample repo to help you run automation test in incognito mode in Java-selenium on LambdaTest. Run your Java Selenium tests on LambdaTest platform.

How to run automation test in incognito mode in Java-selenium on LambdaTest Prerequisites Install and set environment variable for java. Windows - htt

null 12 Jul 13, 2022
PageRank implementation in hadoop

PageRank implementation in hadoop Use kiwenalu/hadoop-cluster-docker (set cluster size for 5) for running JAR. Load dataset to memory using script

Maksym Zub 1 Jan 24, 2022