Real-time Query for Hadoop; mirror of Apache Impala

Related tags

Big data Impala
Overview

Welcome to Impala

Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters.

Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:

  • Best of breed performance and scalability.
  • Support for data stored in HDFS, Apache HBase and Amazon S3.
  • Wide analytic SQL support, including window functions and subqueries.
  • On-the-fly code generation using LLVM to generate CPU-efficient code tailored specifically to each individual query.
  • Support for the most commonly-used Hadoop file formats, including the Apache Parquet project.
  • Apache-licensed, 100% open source.

More about Impala

To learn more about Impala as a business user, or to try Impala live or in a VM, please visit the Impala homepage.

If you are interested in contributing to Impala as a developer, or learning more about Impala's internals and architecture, visit the Impala wiki.

Supported Platforms

Impala only supports Linux at the moment.

Export Control Notice

This distribution uses cryptographic software and may be subject to export controls. Please refer to EXPORT_CONTROL.md for more information.

Build Instructions

See bin/bootstrap_build.sh.

Detailed Build Notes

Impala can be built with pre-built components or components downloaded from S3. The components needed to build Impala are Apache Hadoop, Hive, HBase, and Sentry. If you need to manually override the locations or versions of these components, you can do so through the environment variables and scripts listed below.

Scripts and directories
Location Purpose
bin/impala-config.sh This script must be sourced to setup all environment variables properly to allow other scripts to work
bin/impala-config-local.sh A script can be created in this location to set local overrides for any environment variables
bin/impala-config-branch.sh A version of the above that can be checked into a branch for convenience.
bin/bootstrap_build.sh A helper script to bootstrap some of the build requirements.
bin/bootstrap_development.sh A helper script to bootstrap a developer environment. Please read it before using.
be/build/ Impala build output goes here.
be/generated-sources/ Thrift and other generated source will be found here.
Build Related Variables
Environment variable Default value Description
IMPALA_HOME Top level Impala directory
IMPALA_TOOLCHAIN "${IMPALA_HOME}/toolchain" Native toolchain directory (for compilers, libraries, etc.)
SKIP_TOOLCHAIN_BOOTSTRAP "false" Skips downloading the toolchain any python dependencies if "true"
CDH_BUILD_NUMBER Identifier to indicate the CDH build number
CDH_COMPONENTS_HOME "${IMPALA_HOME}/toolchain/cdh_components-${CDH_BUILD_NUMBER}" Location of the CDH components within the toolchain.
CDH_MAJOR_VERSION "5" Identifier used to uniqueify paths for potentially incompatible component builds.
IMPALA_CONFIG_SOURCED "1" Set by ${IMPALA_HOME}/bin/impala-config.sh (internal use)
JAVA_HOME "/usr/lib/jvm/${JAVA_VERSION}" Used to locate Java
JAVA_VERSION "java-7-oracle-amd64" Can override to set a local Java version.
JAVA "${JAVA_HOME}/bin/java" Java binary location.
CLASSPATH See bin/set-classpath.sh for details.
PYTHONPATH Will be changed to include: "${IMPALA_HOME}/shell/gen-py" "${IMPALA_HOME}/testdata" "${THRIFT_HOME}/python/lib/python2.7/site-packages" "${HIVE_HOME}/lib/py" "${IMPALA_HOME}/shell/ext-py/prettytable-0.7.1/dist/prettytable-0.7.1" "${IMPALA_HOME}/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x "${IMPALA_HOME}/shell/ext-py/sqlparse-0.1.19/dist/sqlparse-0.1.19-py2
Source Directories for Impala
Environment variable Default value Description
IMPALA_BE_DIR "${IMPALA_HOME}/be" Backend directory. Build output is also stored here.
IMPALA_FE_DIR "${IMPALA_HOME}/fe" Frontend directory
IMPALA_COMMON_DIR "${IMPALA_HOME}/common" Common code (thrift, function registry)
Various Compilation Settings
Environment variable Default value Description
IMPALA_BUILD_THREADS "8" or set to number of processors by default. Used for make -j and distcc -j settings.
IMPALA_MAKE_FLAGS "" Any extra settings to pass to make. Also used when copying udfs / udas into HDFS.
USE_SYSTEM_GCC "0" If set to any other value, directs cmake to not set GCC_ROOT, CMAKE_C_COMPILER, CMAKE_CXX_COMPILER, as well as setting TOOLCHAIN_LINK_FLAGS
IMPALA_CXX_COMPILER "default" Used by cmake (cmake_modules/toolchain and clang_toolchain.cmake) to select gcc / clang
USE_GOLD_LINKER "true" Directs backend cmake to use gold.
IS_OSX "false" (Experimental) currently only used to disable Kudu.
Dependencies
Environment variable Default value Description
HADOOP_HOME "${CDH_COMPONENTS_HOME}/hadoop-${IMPALA_HADOOP_VERSION}/" Used to locate Hadoop
HADOOP_INCLUDE_DIR "${HADOOP_HOME}/include" For 'hdfs.h'
HADOOP_LIB_DIR "${HADOOP_HOME}/lib" For 'libhdfs.a' or 'libhdfs.so'
HIVE_HOME "${CDH_COMPONENTS_HOME}/{hive-${IMPALA_HIVE_VERSION}/"
HIVE_SRC_DIR "${HIVE_HOME}/src" Used to find Hive thrift files.
HBASE_HOME "${CDH_COMPONENTS_HOME}/hbase-${IMPALA_HBASE_VERSION}/"
SENTRY_HOME "${CDH_COMPONENTS_HOME}/sentry-${IMPALA_SENTRY_VERSION}/" Used to setup test data
THRIFT_HOME "${IMPALA_TOOLCHAIN}/thrift-${IMPALA_THRIFT_VERSION}"
Comments
  • Ubuntu build fixes and instructions

    Ubuntu build fixes and instructions

    This pull request makes Impala build on Ubuntu 12.04. It basically contains these changes:

    • Fix the linker settings to use --no-as-needed for executables (--as-needed is default on current Ubuntu distributions).
    • Fix boost linking setup.
    • Add an explicit copy assignment operator to FragmentExecParams to work around a bug with Boost 1.4x and newer compilers where Boost generates a copy assignment operator that takes a non-const argument, but newer compilers/STL require a const argument.
    • Add support for versioned llvm binaries (e.g. llvm-config-3.0) as llvm ubuntu/debian packages don't (currently) use the alternatives system for these.
    • Fix the maven repository settings (#8).
    opened by tomdz 5
  • Support table name tab auto-completion in impala_shell.py

    Support table name tab auto-completion in impala_shell.py

    Impala Shell doesn't support table name tab auto-completion. I try to add some code to make this possible. When "USE some_db", the impala-shell will collect all table names using "SHOW TABLES" in the background. Then it can provide tab auto-completion when using other commands such as 'SELECT'

    opened by wangruowen 2
  • Fix multi-node planning of approximate distinct aggregation.

    Fix multi-node planning of approximate distinct aggregation.

    The first phase of a distributed approximate distinct aggregation is an intermediate node and should not finalize the aggregate tuple. This was causing incorrect results for the distinctpc() and distinctpcsa() operators, when creating a multi-node plan. As with other aggregates this fix does not address what happens when there is also a distinct aggregation.

    opened by superdupershant 2
  • Build thrift automatically as part of the thirdparty build step

    Build thrift automatically as part of the thirdparty build step

    This makes the compilation use the thrift it downloads already (via download_thirdparty.sh) instead of the requiring the user to manually download and install it. It also fixes two typos in the README, and adds support for custom cmake arguments (via the CMAKE_ARGS environment variable).

    opened by tomdz 2
  • Build(deps): Bump numpy from 1.10.4 to 1.21.0 in /infra/python/deps

    Build(deps): Bump numpy from 1.10.4 to 1.21.0 in /infra/python/deps

    Bumps numpy from 1.10.4 to 1.21.0.

    Release notes

    Sourced from numpy's releases.

    v1.21.0

    NumPy 1.21.0 Release Notes

    The NumPy 1.21.0 release highlights are

    • continued SIMD work covering more functions and platforms,
    • initial work on the new dtype infrastructure and casting,
    • universal2 wheels for Python 3.8 and Python 3.9 on Mac,
    • improved documentation,
    • improved annotations,
    • new PCG64DXSM bitgenerator for random numbers.

    In addition there are the usual large number of bug fixes and other improvements.

    The Python versions supported for this release are 3.7-3.9. Official support for Python 3.10 will be added when it is released.

    :warning: Warning: there are unresolved problems compiling NumPy 1.21.0 with gcc-11.1 .

    • Optimization level -O3 results in many wrong warnings when running the tests.
    • On some hardware NumPy will hang in an infinite loop.

    New functions

    Add PCG64DXSM BitGenerator

    Uses of the PCG64 BitGenerator in a massively-parallel context have been shown to have statistical weaknesses that were not apparent at the first release in numpy 1.17. Most users will never observe this weakness and are safe to continue to use PCG64. We have introduced a new PCG64DXSM BitGenerator that will eventually become the new default BitGenerator implementation used by default_rng in future releases. PCG64DXSM solves the statistical weakness while preserving the performance and the features of PCG64.

    See upgrading-pcg64 for more details.

    (gh-18906)

    Expired deprecations

    • The shape argument numpy.unravel_index cannot be passed as dims keyword argument anymore. (Was deprecated in NumPy 1.16.)

    ... (truncated)

    Commits
    • b235f9e Merge pull request #19283 from charris/prepare-1.21.0-release
    • 34aebc2 MAINT: Update 1.21.0-notes.rst
    • 493b64b MAINT: Update 1.21.0-changelog.rst
    • 07d7e72 MAINT: Remove accidentally created directory.
    • 032fca5 Merge pull request #19280 from charris/backport-19277
    • 7d25b81 BUG: Fix refcount leak in ResultType
    • fa5754e BUG: Add missing DECREF in new path
    • 61127bb Merge pull request #19268 from charris/backport-19264
    • 143d45f Merge pull request #19269 from charris/backport-19228
    • d80e473 BUG: Removed typing for == and != in dtypes
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 1
  • Build(deps): Bump paramiko from 2.4.1 to 2.4.2 in /infra/python/deps

    Build(deps): Bump paramiko from 2.4.1 to 2.4.2 in /infra/python/deps

    Bumps paramiko from 2.4.1 to 2.4.2.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • webserver option to set X-Frame-Options header

    webserver option to set X-Frame-Options header

    Hello.

    This is tiny, but really useful patch for me. I failed to test this on a live system, so I would really appreciate if someone could test it for me.

    opened by oxpa 1
  • Merge branch 'impala-kudu' into cdh5-trunk

    Merge branch 'impala-kudu' into cdh5-trunk

    This merges the impala-kudu branch into impala trunk (cdh5-trunk).

    Note that gerrit does not support merge commits so this is submitted as a pull request in github.

    All conflicts have been resolved and funcionality minimally tested. Below is the list of files that had conflicts and require careful review.

    This is still requiring a GVM run.

    Conflicts: CMakeLists.txt be/CMakeLists.txt be/src/exec/CMakeLists.txt be/src/exec/exec-node.cc be/src/testutil/desc-tbl-builder.cc be/src/testutil/desc-tbl-builder.h bin/impala-config.sh common/thrift/PlanNodes.thrift common/thrift/generate_error_codes.py fe/.settings/org.eclipse.jdt.core.prefs fe/src/main/cup/sql-parser.y fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java fe/src/main/java/com/cloudera/impala/analysis/Analyzer.java fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java fe/src/main/java/com/cloudera/impala/analysis/InlineViewRef.java fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java fe/src/main/java/com/cloudera/impala/analysis/SelectStmt.java fe/src/main/java/com/cloudera/impala/analysis/StmtRewriter.java fe/src/main/java/com/cloudera/impala/planner/Planner.java fe/src/main/java/com/cloudera/impala/planner/PlannerContext.java testdata/bin/compute-table-stats.sh testdata/bin/generate-schema-statements.py testdata/bin/run-all.sh testdata/datasets/functional/schema_constraints.csv tests/metadata/test_ddl.py tests/query_test/test_hdfs_fd_caching.py

    Change-Id: I4091d7d1805e25256a605d7069c1a2ff594998ea

    opened by dralves 1
  • Why impala takes so much time to fetch rows?

    Why impala takes so much time to fetch rows?

    I checked that the major time of the query latency is related with fetch rows. We can see that query is FINISHED in x secounds but only ends much time later depending on the number or rows fetched.

    After several testes, I checked that:

    • The query latency when executed on impala-shell is not afected by fetch rows.
    • But the same query executed via JDBC on same host that impala-shell, takes very much time to end.
    • The JDBC fetchSize can´t be set to a value greater than 1024 rows. Change the batch_size on impalad server also no takes effect.
    • One diference of the impala-shell relatively to JDBC is the port used 21000 and 21050 respectively.

    But even on jdbc if I concat all columns in only one column the latency of the query is similar to the time of the same query executed via impala-shell. I think that exist some weight in processing of the columns that causes the increase of the latency. It is affected depending on the number of columns and rows.

    Can someone explain why this happens?

    opened by mangonc 1
  • Change the install path of OpenLDAP

    Change the install path of OpenLDAP

    This is to avoid problems for case-insensitive filesystems, cause there also exists an INSTALL file in the directory where install path is attempted to be created.

    opened by QwertyManiac 1
  • Do not build FB303's Java and PHP bindings

    Do not build FB303's Java and PHP bindings

    Since the java bindings of thrift are not required at all, we can skip building the same for its contrib/fb303 subproject. Same with PHP bindings, as we do not use it. We only require the C/C++ and Python ones.

    opened by QwertyManiac 1
  • Build(deps): Bump setuptools from 36.8.0 to 65.5.1 in /infra/python/deps

    Build(deps): Bump setuptools from 36.8.0 to 65.5.1 in /infra/python/deps

    Bumps setuptools from 36.8.0 to 65.5.1.

    Release notes

    Sourced from setuptools's releases.

    v65.5.1

    No release notes provided.

    v65.5.0

    No release notes provided.

    v65.4.1

    No release notes provided.

    v65.4.0

    No release notes provided.

    v65.3.0

    No release notes provided.

    v65.2.0

    No release notes provided.

    v65.1.1

    No release notes provided.

    v65.1.0

    No release notes provided.

    v65.0.2

    No release notes provided.

    v65.0.1

    No release notes provided.

    v65.0.0

    No release notes provided.

    v64.0.3

    No release notes provided.

    v64.0.2

    No release notes provided.

    v64.0.1

    No release notes provided.

    v64.0.0

    No release notes provided.

    v63.4.3

    No release notes provided.

    v63.4.2

    No release notes provided.

    ... (truncated)

    Changelog

    Sourced from setuptools's changelog.

    v65.5.1

    Misc ^^^^

    • #3638: Drop a test dependency on the mock package, always use :external+python:py:mod:unittest.mock -- by :user:hroncok
    • #3659: Fixed REDoS vector in package_index.

    v65.5.0

    Changes ^^^^^^^

    • #3624: Fixed editable install for multi-module/no-package src-layout projects.
    • #3626: Minor refactorings to support distutils using stdlib logging module.

    Documentation changes ^^^^^^^^^^^^^^^^^^^^^

    • #3419: Updated the example version numbers to be compliant with PEP-440 on the "Specifying Your Project’s Version" page of the user guide.

    Misc ^^^^

    • #3569: Improved information about conflicting entries in the current working directory and editable install (in documentation and as an informational warning).
    • #3576: Updated version of validate_pyproject.

    v65.4.1

    Misc ^^^^

    • #3613: Fixed encoding errors in expand.StaticModule when system default encoding doesn't match expectations for source files.
    • #3617: Merge with pypa/distutils@6852b20 including fix for pypa/distutils#181.

    v65.4.0

    Changes ^^^^^^^

    v65.3.0

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 0
  • Build(deps): Bump numpy from 1.10.4 to 1.22.0 in /infra/python/deps

    Build(deps): Bump numpy from 1.10.4 to 1.22.0 in /infra/python/deps

    Bumps numpy from 1.10.4 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 0
  • Build(deps): Bump paramiko from 2.4.1 to 2.10.1 in /infra/python/deps

    Build(deps): Bump paramiko from 2.4.1 to 2.10.1 in /infra/python/deps

    Bumps paramiko from 2.4.1 to 2.10.1.

    Commits
    • 286bd9f Cut 2.10.1
    • 4c491e2 Fix CVE re: PKey.write_private_key chmod race
    • aa3cc6f Cut 2.10.0
    • e50e19f Fix up changelog entry with real links
    • 02ad67e Helps to actually leverage your mocked system calls
    • 29d7bf4 Clearly our agent stuff is not fully tested yet...
    • 5fcb8da OpenSSH docs state %C should also work in IdentityFile and Match exec
    • 1bf3dce Changelog enhancement
    • f6342fc Prettify, add %C as acceptable controlpath token, mock gethostname
    • 3f3451f Add to changelog
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 0
  • Build(deps): Bump ipython from 1.2.1 to 7.16.3 in /infra/python/deps

    Build(deps): Bump ipython from 1.2.1 to 7.16.3 in /infra/python/deps

    Bumps ipython from 1.2.1 to 7.16.3.

    Release notes

    Sourced from ipython's releases.

    7.9.0

    No release notes provided.

    7.8.0

    No release notes provided.

    7.7.0

    No release notes provided.

    7.6.1

    No release notes provided.

    7.6.0

    No release notes provided.

    7.5.0

    No release notes provided.

    7.4.0

    No release notes provided.

    7.3.0

    No release notes provided.

    7.2.0

    No release notes provided.

    7.1.1

    No release notes provided.

    7.1.0

    No release notes provided.

    7.0.1

    No release notes provided.

    7.0.0

    No release notes provided.

    7.0.0-doc

    No release notes provided.

    7.0.0rc1

    No release notes provided.

    7.0.0b1

    No release notes provided.

    6.2.1

    No release notes provided.

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 0
  • Build(deps): Bump py from 1.4.32 to 1.10.0 in /infra/python/deps

    Build(deps): Bump py from 1.4.32 to 1.10.0 in /infra/python/deps

    Bumps py from 1.4.32 to 1.10.0.

    Changelog

    Sourced from py's changelog.

    1.10.0 (2020-12-12)

    • Fix a regular expression DoS vulnerability in the py.path.svnwc SVN blame functionality (CVE-2020-29651)
    • Update vendored apipkg: 1.4 => 1.5
    • Update vendored iniconfig: 1.0.0 => 1.1.1

    1.9.0 (2020-06-24)

    • Add type annotation stubs for the following modules:

      • py.error
      • py.iniconfig
      • py.path (not including SVN paths)
      • py.io
      • py.xml

      There are no plans to type other modules at this time.

      The type annotations are provided in external .pyi files, not inline in the code, and may therefore contain small errors or omissions. If you use py in conjunction with a type checker, and encounter any type errors you believe should be accepted, please report it in an issue.

    1.8.2 (2020-06-15)

    • On Windows, py.path.locals which differ only in case now have the same Python hash value. Previously, such paths were considered equal but had different hashes, which is not allowed and breaks the assumptions made by dicts, sets and other users of hashes.

    1.8.1 (2019-12-27)

    • Handle FileNotFoundError when trying to import pathlib in path.common on Python 3.4 (#207).

    • py.path.local.samefile now works correctly in Python 3 on Windows when dealing with symlinks.

    1.8.0 (2019-02-21)

    • add "importlib" pyimport mode for python3.5+, allowing unimportable test suites to contain identically named modules.

    • fix LocalPath.as_cwd() not calling os.chdir() with None, when being invoked from a non-existing directory.

    ... (truncated)

    Commits
    • e5ff378 Update CHANGELOG for 1.10.0
    • 94cf44f Update vendored libs
    • 5e8ded5 testing: comment out an assert which fails on Python 3.9 for now
    • afdffcc Rename HOWTORELEASE.rst to RELEASING.rst
    • 2de53a6 Merge pull request #266 from nicoddemus/gh-actions
    • fa1b32e Merge pull request #264 from hugovk/patch-2
    • 887d6b8 Skip test_samefile_symlink on pypy3 on Windows
    • e94e670 Fix test_comments() in test_source
    • fef9a32 Adapt test
    • 4a694b0 Add GitHub Actions badge to README
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Build(deps): Bump cryptography from 1.8.1 to 3.2 in /infra/python/deps

    Build(deps): Bump cryptography from 1.8.1 to 3.2 in /infra/python/deps

    Bumps cryptography from 1.8.1 to 3.2.

    Changelog

    Sourced from cryptography's changelog.

    3.2 - 2020-10-25

    
    * **SECURITY ISSUE:** Attempted to make RSA PKCS#1v1.5 decryption more constant
      time, to protect against Bleichenbacher vulnerabilities. Due to limitations
      imposed by our API, we cannot completely mitigate this vulnerability and a
      future release will contain a new API which is designed to be resilient to
      these for contexts where it is required. Credit to **Hubert Kario** for
      reporting the issue. *CVE-2020-25659*
    * Support for OpenSSL 1.0.2 has been removed. Users on older version of OpenSSL
      will need to upgrade.
    * Added basic support for PKCS7 signing (including SMIME) via
      :class:`~cryptography.hazmat.primitives.serialization.pkcs7.PKCS7SignatureBuilder`.
    

    .. _v3-1-1:

    3.1.1 - 2020-09-22

    • Updated Windows, macOS, and manylinux wheels to be compiled with OpenSSL 1.1.1h.

    .. _v3-1:

    3.1 - 2020-08-26

    
    * **BACKWARDS INCOMPATIBLE:** Removed support for ``idna`` based
      :term:`U-label` parsing in various X.509 classes. This support was originally
      deprecated in version 2.1 and moved to an extra in 2.5.
    * Deprecated OpenSSL 1.0.2 support. OpenSSL 1.0.2 is no longer supported by
      the OpenSSL project. The next version of ``cryptography`` will drop support
      for it.
    * Deprecated support for Python 3.5. This version sees very little use and will
      be removed in the next release.
    * ``backend`` arguments to functions are no longer required and the
      default backend will automatically be selected if no ``backend`` is provided.
    * Added initial support for parsing certificates from PKCS7 files with
      :func:`~cryptography.hazmat.primitives.serialization.pkcs7.load_pem_pkcs7_certificates`
      and
      :func:`~cryptography.hazmat.primitives.serialization.pkcs7.load_der_pkcs7_certificates`
      .
    * Calling ``update`` or ``update_into`` on
      :class:`~cryptography.hazmat.primitives.ciphers.CipherContext` with ``data``
      longer than 2\ :sup:`31` bytes no longer raises an ``OverflowError``. This
      also resolves the same issue in :doc:`/fernet`.
    

    .. _v3-0:

    3.0 - 2020-07-20 </tr></table>

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(cdh5.4.1-release)
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.8k Dec 28, 2022
Hadoop library for large-scale data processing, now an Apache Incubator project

Apache DataFu Follow @apachedatafu Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by

LinkedIn's Attic 589 Apr 1, 2022
Apache Druid: a high performance real-time analytics database.

Website | Documentation | Developer Mailing List | User Mailing List | Slack | Twitter | Download Apache Druid Druid is a high performance real-time a

The Apache Software Foundation 12.3k Jan 9, 2023
Mirror of Apache Storm

Master Branch: Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processi

The Apache Software Foundation 6.4k Jan 3, 2023
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

Elephant Bird About Elephant Bird is Twitter's open source library of LZO, Thrift, and/or Protocol Buffer-related Hadoop InputFormats, OutputFormats,

Twitter 1.1k Jan 5, 2023
Google Mr4c GNU Lesser 3 Google Mr4c MR4C is an implementation framework that allows you to run native code within the Hadoop execution framework. License: GNU Lesser 3, .

Introduction to the MR4C repo About MR4C MR4C is an implementation framework that allows you to run native code within the Hadoop execution framework.

Google 911 Dec 9, 2022
In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.

All the files have been commented for your ease. Furthermore you may also add further comments if you may. For further queries contact me at : chhxnsh

Hassan Shahzad 5 Aug 14, 2021
Program finds average number of words in each comment given a large data set by use of hadoop's map reduce to work in parallel efficiently.

Finding average number of words in all the comments in a data set ?? Mapper Function In the mapper function we first tokenize entire data and then fin

Aleezeh Usman 3 Aug 23, 2021
Program that uses Hadoop Map-Reduce to identify the anagrams of the words of a file

Hadoop-MapReduce-Anagram-Solver The implementation consists of a program that utilizes the Hadoop Map-Reduce framework to identify the anagrams of the

Nikolas Petrou 2 Dec 4, 2022
PageRank implementation in hadoop

PageRank implementation in hadoop Use kiwenalu/hadoop-cluster-docker (set cluster size for 5) for running JAR. Load dataset to memory using script

Maksym Zub 1 Jan 24, 2022
A platform for visualization and real-time monitoring of data workflows

Status This project is no longer maintained. Ambrose Twitter Ambrose is a platform for visualization and real-time monitoring of MapReduce data workfl

Twitter 1.2k Dec 31, 2022
The official home of the Presto distributed SQL query engine for big data

Presto Presto is a distributed SQL query engine for big data. See the User Manual for deployment instructions and end user documentation. Requirements

Presto 14.3k Jan 5, 2023
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

Heron is a realtime analytics platform developed by Twitter. It has a wide array of architectural improvements over it's predecessor. Heron in Apache

The Apache Software Foundation 3.6k Dec 28, 2022
Apache Flink

Apache Flink Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. Learn more about Flin

The Apache Software Foundation 20.4k Jan 5, 2023
Apache Hive

Apache Hive (TM) The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storag

The Apache Software Foundation 4.6k Dec 28, 2022
This code base is retained for historical interest only, please visit Apache Incubator Repo for latest one

Apache Kylin Apache Kylin is an open source Distributed Analytics Engine to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supp

Kylin OLAP Engine 561 Dec 4, 2022
Apache Dubbo漏洞测试Demo及其POC

DubboPOC Apache Dubbo 漏洞POC 持续更新中 CVE-2019-17564 CVE-2020-1948 CVE-2020-1948绕过 CVE-2021-25641 CVE-2021-30179 others 免责声明 项目仅供学习使用,任何未授权检测造成的直接或者间接的后果及

lz2y 19 Dec 12, 2022
A scalable, mature and versatile web crawler based on Apache Storm

StormCrawler is an open source collection of resources for building low-latency, scalable web crawlers on Apache Storm. It is provided under Apache Li

DigitalPebble Ltd 776 Jan 2, 2023
Flink CDC Connectors is a set of source connectors for Apache Flink

Flink CDC Connectors is a set of source connectors for Apache Flink, ingesting changes from different databases using change data capture (CDC). The Flink CDC Connectors integrates Debezium as the engine to capture data changes.

null 6 Mar 23, 2022