Example code from Learning Spark book

Overview

buildstatus Examples for Learning Spark

Examples for the Learning Spark book. These examples require a number of libraries and as such have long build files. We have also added a stand alone example with minimal dependencies and a small build file in the mini-complete-example directory.

These examples have been updated to run against Spark 1.3 so they may be slightly different than the versions in your copy of "Learning Spark".

Requirements

  • JDK 1.7 or higher
  • Scala 2.10.3
  • scala-lang.org
  • Spark 1.3
  • Protobuf compiler
  • On debian you can install with sudo apt-get install protobuf-compiler
  • R & the CRAN package Imap are required for the ChapterSixExample
  • The Python examples require urllib3

Python examples

From spark just run ./bin/pyspark ./src/python/[example]

Spark Submit

You can also create an assembly jar with all of the dependencies for running either the java or scala versions of the code and run the job with the spark-submit script

./sbt/sbt assembly OR mvn package cd $SPARK_HOME; ./bin/spark-submit --class com.oreilly.learningsparkexamples.[lang].[example] ../learning-spark-examples/target/scala-2.10/learning-spark-examples-assembly-0.0.1.jar

Learning Spark

Comments
  • sbt assembly error

    sbt assembly error

    After running "sbt assembly", I got the following error:

    [error] learning-spark/src/main/scala/com/oreilly/learningsparkexamples/scala/BasicParseJsonWithJackson.scala:47: not found: type ioRecord [error] Some(mapper.readValue(record, classOf[ioRecord])) [error] ^ [error] learning-spark/src/main/scala/com/oreilly/learningsparkexamples/scala/BasicParseJsonWithJackson.scala:53: value lovesPandas is not a member of Nothing [error] result.filter(.lovesPandas).map(mapper.writeValueAsString()) [error] ^ [error] two errors found error Compilation failed [error] Total time: 24 s, completed Jun 17, 2015 10:00:31 PM

    Removing BasicParseJsonWithJackson.scala and recompiling would result in other errors related to protocol buffer. Has anyone successfully built the project? How did you do it?

    Thanks!

    opened by yuchaoran2011 12
  • Bump jackson-databind from 2.3.3 to 2.9.10.7

    Bump jackson-databind from 2.3.3 to 2.9.10.7

    Bumps jackson-databind from 2.3.3 to 2.9.10.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 3
  • build failure due to protocol buffer

    build failure due to protocol buffer

    Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:248) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) at sbt.SimpleProcessBuilder.run(ProcessImpl.scala:349) at sbt.AbstractProcessBuilder.run(ProcessImpl.scala:128) at sbt.AbstractProcessBuilder$$anonfun$runBuffered$1.apply(ProcessImpl.scala:159) at sbt.AbstractProcessBuilder$$anonfun$runBuffered$1.apply(ProcessImpl.scala:159) at sbt.BufferedLogger.buffer(BufferedLogger.scala:25) at sbt.AbstractProcessBuilder.runBuffered(ProcessImpl.scala:159) at sbt.AbstractProcessBuilder.$bang(ProcessImpl.scala:156) at sbtprotobuf.ProtobufPlugin$$anonfun$protobufSettings$6$$anonfun$apply$1.apply(ProtobufPlugin.scala:27) at sbtprotobuf.ProtobufPlugin$$anonfun$protobufSettings$6$$anonfun$apply$1.apply(ProtobufPlugin.scala:27) at sbtprotobuf.ProtobufPlugin$.executeProtoc(ProtobufPlugin.scala:66) at sbtprotobuf.ProtobufPlugin$.sbtprotobuf$ProtobufPlugin$$compile(ProtobufPlugin.scala:81) at sbtprotobuf.ProtobufPlugin$$anonfun$sourceGeneratorTask$1$$anonfun$5.apply(ProtobufPlugin.scala:107) at sbtprotobuf.ProtobufPlugin$$anonfun$sourceGeneratorTask$1$$anonfun$5.apply(ProtobufPlugin.scala:106) at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:235) at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:235) at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:249) at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:245) at sbt.Difference.apply(Tracked.scala:224) at sbt.Difference.apply(Tracked.scala:206) at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:245) at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:244) at sbt.Difference.apply(Tracked.scala:224) at sbt.Difference.apply(Tracked.scala:200) at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:244) at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:242) at sbtprotobuf.ProtobufPlugin$$anonfun$sourceGeneratorTask$1.apply(ProtobufPlugin.scala:109) at sbtprotobuf.ProtobufPlugin$$anonfun$sourceGeneratorTask$1.apply(ProtobufPlugin.scala:104) at scala.Function7$$anonfun$tupled$1.apply(Function7.scala:35) at scala.Function7$$anonfun$tupled$1.apply(Function7.scala:34) at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) at sbt.std.Transform$$anon$4.work(System.scala:63) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226) at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) at sbt.Execute.work(Execute.scala:235) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226) at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159) at sbt.CompletionService$$anon$2.call(CompletionService.scala:28) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) error error occured while compiling protobuf files: Cannot run program "protoc": error=2, No such file or directory

    opened by swadhawan 3
  • Difference between Running spark as local[*] Vs Yarn-client Vs Yarn-cluster in terms of performance

    Difference between Running spark as local[*] Vs Yarn-client Vs Yarn-cluster in terms of performance

    Kindly consider this as an inquiry if not an issue.

    Hi , I am evaluating Spark to use here at my work.

    We have an existing Hortonworks HDP 2.3 install.

    I am trying to work out whether I should use local or client or cluster to submit a job in Spark.

    Consider I am running my job as : sudo -u hdfs spark-submit --class "org.xyz.Spark_ES_Java_V4" --master "local[*]" target/xyz-1.1-jar-with-dependencies.jar 192.168.0.185 55555 > prashant.txt

    In this I am able to do the task in 14 Sec.

    When I run the same like sudo -u hdfs spark-submit --class "org.xyz.Spark_ES_Java_V4" --master "yarn-client" target/xyz-1.1-jar-with-dependencies.jar 192.168.0.185 55555 > prashant.txt

    It takes 16 Second

    And this one sudo -u hdfs spark-submit --class "org.xyz.Spark_ES_Java_V4" --master "yarn-cluster" target/xyz-1.1-jar-with-dependencies.jar 192.168.0.185 55555 > prashant.txt

    Takes 18 Second.

    As in first case I am running it locally means its running on one machine and taking less time where as in later caseI am submitting the job to cluster with 4 node.

    So can anyone let me know what is the use of running the same in cluster as I am getting performance degrade with cluster. Or if any way is there where I can enhance the performance with cluster.

    Would love to hear from someone regarding this very urgently.

    ~Prashant

    opened by prashanttct07 2
  • Bump jackson-databind from 2.3.3 to 2.13.4.1

    Bump jackson-databind from 2.3.3 to 2.13.4.1

    Bumps jackson-databind from 2.3.3 to 2.13.4.1.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump jackson-databind from 2.3.3 to 2.12.6.1

    Bump jackson-databind from 2.3.3 to 2.12.6.1

    Bumps jackson-databind from 2.3.3 to 2.12.6.1.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump jackson-databind from 2.3.3 to 2.9.10.8

    Bump jackson-databind from 2.3.3 to 2.9.10.8

    Bumps jackson-databind from 2.3.3 to 2.9.10.8.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump jackson-databind from 2.3.3 to 2.9.10.4

    Bump jackson-databind from 2.3.3 to 2.9.10.4

    Bumps jackson-databind from 2.3.3 to 2.9.10.4.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump jackson-databind from 2.3.3 to 2.9.10.3

    Bump jackson-databind from 2.3.3 to 2.9.10.3

    Bumps jackson-databind from 2.3.3 to 2.9.10.3.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump jackson-databind from 2.3.3 to 2.9.10.1

    Bump jackson-databind from 2.3.3 to 2.9.10.1

    Bumps jackson-databind from 2.3.3 to 2.9.10.1.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Add unit tests for class ChapterSixExample

    Add unit tests for class ChapterSixExample

    Hi,

    I've analysed your codebase and produced some unit tests for one of the classes - ChapterSixExample

    I've written the tests for these functions with the help of Diffblue Cover.

    Hopefully, these tests should help you detect regressions caused by future code changes.

    opened by louismillsdiffblue 1
  • Bump jackson-databind from 2.3.3 to 2.12.7.1

    Bump jackson-databind from 2.3.3 to 2.12.7.1

    Bumps jackson-databind from 2.3.3 to 2.12.7.1.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • [SECURITY] Use HTTPS to resolve dependencies in Maven Build

    [SECURITY] Use HTTPS to resolve dependencies in Maven Build

    mitm_build


    This is a security fix for a high severity vulnerability in your Apache Maven pom.xml file(s).

    The build files indicate that this project is resolving dependencies over HTTP instead of HTTPS. This leaves your build vulnerable to allowing a Man in the Middle (MITM) attackers to execute arbitrary code on your or your computer or CI/CD system.

    This vulnerability has a CVSS v3.0 Base Score of 8.1/10.

    POC code has existed since 2014 to maliciously compromise a JAR file in-flight. MITM attacks against HTTP are increasingly common, for example Comcast is known to have done it to their own users.

    This contribution is a part of a submission to the GitHub Security Lab Bug Bounty program.

    Resources

    Detecting this and Future Vulnerabilities

    This vulnerability was automatically detected by GitHub's LGTM.com using this CodeQL Query.

    You can automatically detect future vulnerabilities like this by enabling the free (for open-source) GitHub Action.

    I'm not an employee of GitHub, I'm simply an open-source security researcher.

    Source

    This contribution was automatically generated with an OpenRewrite refactoring recipe, which was lovingly hand crafted to bring this security fix to your repository.

    The source code that generated this PR can be found here: UseHttpsForRepositories

    Opting-Out

    If you'd like to opt-out of future automated security vulnerability fixes like this, please consider adding a file called .github/GH-ROBOTS.txt to your repository with the line:

    User-agent: JLLeitschuh/security-research
    Disallow: *
    

    This bot will respect the ROBOTS.txt format for future contributions.

    Alternatively, if this project is no longer actively maintained, consider archiving the repository.

    CLA Requirements

    This section is only relevant if your project requires contributors to sign a Contributor License Agreement (CLA) for external contributions.

    It is unlikely that I'll be able to directly sign CLAs. However, all contributed commits are already automatically signed-off.

    The meaning of a signoff depends on the project, but it typically certifies that committer has the rights to submit this work under the same license and agrees to a Developer Certificate of Origin (see https://developercertificate.org/ for more information).

    - Git Commit Signoff documentation

    If signing your organization's CLA is a strict-requirement for merging this contribution, please feel free to close this PR.

    Sponsorship & Support

    This contribution is sponsored by HUMAN Security Inc. and the new Dan Kaminsky Fellowship, a fellowship created to celebrate Dan's memory and legacy by funding open-source work that makes the world a better (and more secure) place.

    This PR was generated by Moderne, a free-for-open source SaaS offering that uses format-preserving AST transformations to fix bugs, standardize code style, apply best practices, migrate library versions, and fix common security vulnerabilties at scale.

    Tracking

    All PR's generated as part of this fix are tracked here: https://github.com/JLLeitschuh/security-research/issues/8

    opened by JLLeitschuh 0
  • Basic ideas to solve Spark OOM: Count all the high frequence words in a big table

    Basic ideas to solve Spark OOM: Count all the high frequence words in a big table

    The detail question is:

    I want to count all the high frequence words in a big table.

    I split each sentence of each row, then flatmap to one word per row, then groupby, then count the word number in each group.

    It will OOM.

    opened by guotong1988 0
  • sbt.librarymanagement.ResolveException: Error downloading com.github.gseitz:sbt-protobuf;sbtVersion=1.0;scalaVersion=2.12:0.3.3

    sbt.librarymanagement.ResolveException: Error downloading com.github.gseitz:sbt-protobuf;sbtVersion=1.0;scalaVersion=2.12:0.3.3

    I'm trying to run this project but when I import the sbt I'm getting the following error message:

    sbt.librarymanagement.ResolveException: Error downloading com.github.gseitz:sbt-protobuf;sbtVersion=1.0;scalaVersion=2.12:0.3.3

    Any ideas? I've reinstalled all the required packages and everything seems to be fine.

    opened by MiloVentimiglia 0
  • [SECURITY] Use HTTPS to resolve dependencies in Maven Build

    [SECURITY] Use HTTPS to resolve dependencies in Maven Build

    mitm_build


    This is a security fix for a vulnerability in your Apache Maven pom.xml file(s).

    The build files indicate that this project is resolving dependencies over HTTP instead of HTTPS. This leaves your build vulnerable to allowing a Man in the Middle (MITM) attackers to execute arbitrary code on your or your computer or CI/CD system.

    This vulnerability has a CVSS v3.0 Base Score of 8.1/10.

    POC code has existed since 2014 to maliciously compromise a JAR file in-flight. MITM attacks against HTTP are increasingly common, for example Comcast is known to have done it to their own users.

    This contribution is a part of a submission to the GitHub Security Lab Bug Bounty program.

    Detecting this and Future Vulnerabilities

    This vulnerability was automatically detected by LGTM.com using this CodeQL Query.

    As of September 2019 LGTM.com and Semmle are officially a part of GitHub.

    You can automatically detect future vulnerabilities like this by enabling the free (for open-source) LGTM App.

    I'm not an employee of GitHub nor of Semmle, I'm simply a user of LGTM.com and an open-source security researcher.

    Source

    Yes, this contribution was automatically generated, however, the code to generate this PR was lovingly hand crafted to bring this security fix to your repository.

    The source code that generated and submitted this PR can be found here: JLLeitschuh/bulk-security-pr-generator

    Opting-Out

    If you'd like to opt-out of future automated security vulnerability fixes like this, please consider adding a file called .github/GH-ROBOTS.txt to your repository with the line:

    User-agent: JLLeitschuh/bulk-security-pr-generator
    Disallow: *
    

    This bot will respect the ROBOTS.txt format for future contributions.

    Alternatively, if this project is no longer actively maintained, consider archiving the repository.

    CLA Requirements

    This section is only relevant if your project requires contributors to sign a Contributor License Agreement (CLA) for external contributions.

    It is unlikely that I'll be able to directly sign CLAs. However, all contributed commits are already automatically signed-off.

    The meaning of a signoff depends on the project, but it typically certifies that committer has the rights to submit this work under the same license and agrees to a Developer Certificate of Origin (see https://developercertificate.org/ for more information).

    - Git Commit Signoff documentation

    If signing your organization's CLA is a strict-requirement for merging this contribution, please feel free to close this PR.

    Tracking

    All PR's generated as part of this fix are tracked here: https://github.com/JLLeitschuh/bulk-security-pr-generator/issues/2

    opened by JLLeitschuh 0
Owner
Databricks
Helping data teams solve the world’s toughest problems using data and AI
Databricks
Spring Data Example Projects

Spring Data Examples This repository contains example projects for the different Spring Data modules to showcase the API and how to use the features p

Spring 4.7k Jan 4, 2023
You are looking for examples, code snippets, sample applications for Spring Integration? This is the place.

Spring Integration Samples Note This (master) branch requires Spring Integration 5.0 or above. For samples running against earlier versions of Spring

Spring 2.1k Dec 30, 2022
A simple expressive web framework for java. Spark has a kotlin DSL https://github.com/perwendel/spark-kotlin

Spark - a tiny web framework for Java 8 Spark 2.9.3 is out!! Changeset <dependency> <groupId>com.sparkjava</groupId> <artifactId>spark-core</a

Per Wendel 9.4k Dec 29, 2022
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Sparkler A web crawler is a bot program that fetches resources from the web for the sake of building applications like search engines, knowledge bases

USC Information Retrieval & Data Science 396 Dec 17, 2022
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.8k Dec 28, 2022
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.7k Mar 12, 2021
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.8k Dec 28, 2022
The code examples of the "Effective Software Testing: A Developer's Guide" book

Effective software testing This repository contains the code examples of the Software Testing: A Developer's Guide book, by Maurício Aniche. Each fold

null 44 Dec 29, 2022
The High-Performance Java Persistence book and video course code examples

High-Performance Java Persistence The High-Performance Java Persistence book and video course code examples. I wrote this article about this repositor

Vlad Mihalcea 1.1k Jan 9, 2023
Example Project which uses spark mongo connector !

mongo-spark-connector-springboot Example Project which uses spark mongo connector to read/aggregate & convert into Spark DataSet/Java RDDs Connects to

Vibhor 2 Dec 6, 2022
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Datumbox Machine Learning Framework The Datumbox Machine Learning Framework is an open-source framework written in Java which allows the rapid develop

Vasilis Vryniotis 1.1k Dec 9, 2022
Book Finder application is a client-server application (gRPC) for educational purposes.

Book-Finder Book Finder application is a client-server application (gRPC) for educational purposes. Instalation These projects (Client/Server) are Mav

Mihai-Lucian Rîtan 21 Oct 27, 2022
Fast Bukkit Custom Book Constructor for Minecraft 1.8 to 1.16.5

BookMaker ?? BookMaker is a fast Spigot API to create Custom Book for Minecraft 1.8 to 1.16.5. Features Create a book with your title, author and desc

Giovanni Ranieri 3 Oct 2, 2021
Source codes of book Java Concurrency In Practice, rebuild by maven.

Introduction Source codes of book: Java Concurrency In Practice(2011, Brain Goetz etc. jcip for short.), rebuild from https://jcip.net/ with maven. Mo

Sam Sune 2 Jun 9, 2022
An advanced book explorer/catalog application written in Java and Kotlin.

Boomega An advanced book explorer/catalog application written in Java and Kotlin. ✨ Features Cross-platform Dark/Light theme, modern UI Multiple UI la

Daniel Gyoerffy 54 Nov 10, 2022
A Minecraft Fabric mod to make crafting with the recipe book faster

OneClickCrafting This mod is clientside only. After selecting a recipe in the recipe book, the client with shift-click the crafted item from the resul

BreadMoirai 4 Jun 2, 2022
Spring Boot Login and Registration example with MySQL, JWT, Rest Api - Spring Boot Spring Security Login example

Spring Boot Login example with Spring Security, MySQL and JWT Appropriate Flow for User Login and Registration with JWT Spring Boot Rest Api Architect

null 58 Jan 5, 2023